Detection of Vulnerabilities and Automatic Protection for Web Applications

55 3.5 Confusion matrix of the 3 best classifiers (first two with original data, third . with a balanced data set). In Proceedings of the 46th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, France, 8 pages, June-July 2016.

Objectives

The generated model can then be used as a static analysis tool to detect and identify vulnerabilities in the source code. The model takes into account the order of code elements within the source code being analyzed, and the different states they can take.

Summary of Contributions

A study of patches that correct source code and remove vulnerabilities without compromising the behavior of web applications. A configuration study for a new data mining component, with a set of attributes and a larger dataset.

Structure of the Thesis

Query manipulation
Client-side injection
File and path injection
Command injection

This vulnerability can be prevented by checking if the user input contains the following malicious characters. Defense against this vulnerability is possible by using one of three measures: (1) to cast the parameters received from the user to the correct type (Ronet al., 2015); In Listing 2.4, the username will be cast to string by $user = (string)$_POST['user']; to change. line 4); (2) using the mysql_real_escape_stringPHP sanitization function to invalidate the same malicious characters as SQLI, such as the prime; (3) validating the user inputs, checking if they match some of the following characters< > &.

Detection of Vulnerabilities

Static analysis

For example, if a program contains a buffer overflow vulnerability, there is a data flow that starts in an entry point and ends in a function that manipulates a buffer and requires reliable data (eg, the strcpy-sensitive sink). Later, the RIPS authors improved the PHP object-oriented code analysis tool to statically search for PHP object injection (POI) vulnerabilities that can be exploited by property-oriented programming (POP), i.e. the ability of an attacker to change the properties of an object injected for the purpose of exploiting a POI vulnerability.

Fuzzing

Whitebox fuzzers use symbolic execution and constraint solving applied to the source code (Duchèneet al., 2014). This form of whitebox fuzzing is implemented in the SAGE (Godefroid et al. KLEE (Cadaret al., 2008) and DART (Godefroid et al., 2005) fuzzers), using symbolic execution to exercise all possible program execution paths.

Vulnerabilities and Machine Learning

Machine learning classifiers and data mining
Sequence models and natural language processing
Detecting vulnerabilities using machine learning
Related uses of machine learning

The choice of machine learning algorithm depends on some factors such as the type of problem to be solved and the nature of the dataset (Chandola et al., 2009). SuSi uses machine learning to identify sources and sinks in the Android API source code (Rasthoferet al., 2014).

Removing Vulnerabilities and Runtime Protection

Removing vulnerabilities
Runtime protection
Overview of the approach
Architecture

AMNESIA creates models by analyzing application source code and extracting query structure. Section 3.1 discusses an approach to automatically detect and remediate this type of vulnerability using failure analysis results and predicting false positives through data mining, and presents the architecture of a WAP tool that implements the approach.

Detecting Candidate Vulnerabilities by Taint Analysis

Each branch of the TEPT corresponds to a compromised variable and contains a subbranch for each line of code where the variable becomes compromised (a square in the image). The procedure is repeated to create the branch and insert the dependency into the subbranch.

Predicting False Positives

Classification of vulnerabilities
Classifiers and metrics
Evaluation of classifiers
Selection of classifiers
Final selection and implementation

For each candidate vulnerability, the table shows the values of the attributes (Y or N), and the class, to be assessed manually (supervised machine learning). With the balanced data set, it was one of the best classifiers, despite fpp remaining unchanged.

Fixing and Testing the Source Code

Code correction

For these two classes of vulnerabilities, a fix is inserted for each malicious input that reaches a sensitive sink. For example, if three malicious inputs appear in an echo-sensitive sink (for reflected XSS), then thesan_outfix will be inserted three times (one per each malicious input).

Testing fixed code

Nothing is output for SQLI, reflected XSS, and PHPCI, and application execution continues. For others, where patches perform validation, an alarm is raised when an attack is detected and the execution of the web application is stopped.

Implementation and Challenges

If the analysis is global, it means that the propagation of corruption also propagates when functions in different modules are called. This means that WAP handles multiple TSTs and TEPTs for proper fault propagation.

Experimental Evaluation

Large scale evaluation
Taint analysis comparative evaluation
Full comparative evaluation
Fixing vulnerabilities
Testing fixed applications

The confusion matrix of the LR model for PhpMinerII (Table 3.14) shows that it correctly classified 68 cases, with 48 as vulnerabilities and 20 as non-vulnerabilities. The comparison with Pixy can be extracted from table 3.12; however, we cannot show the results of PhpMinerII in the table because it does not really identify vulnerabilities.

Conclusions

The chapter addresses the difficulty of extending these tools by proposing a modular and extensible version of the WAP tool (presented in Chapter 3), which is equipped with weapons (WAP extensions) to detect and exploit new vulnerability classes. to correct. This version of the tool covers eight vulnerability classes: SQLI, XSS (mirrored and stored), remote file integration (RFI), local file integration (LFI), directory or path traversal (DT/PT), OS command injection (OSCI) , source code disclosure (SCD), and PHP command injection (PHPCI).

Restructuring WAP

Code analyzer
False positive predictor
Code corrector
Weapons
Effort to modify WAP

The attributes represent symptoms of the same kind, for example the type-checking attribute represents the symptoms that check the data type of variables. The user cleanup template is chosen if the user specifies the malicious characters that can be used to exploit the vulnerability and one that can be used to neutralize them (for example, the backslash).

Extending WAP with weapons

Reusing the sub-modules

Detection of the four vulnerabilities mentioned above can be included in the submodules of section 4.2.1, and patches to remove them can be created using the patch template (section 4.2.3). Regarding LDAPI and XPathI, a patch was created for each using the user authentication patch template.

Creating weapons

They validate the user input content against JavaScript code, so we changed them to also check the input content against URIs/hyperlinks.

Experimental Evaluation

Real web applications

The last four columns of the table show the number of predicted (FPP) and unpredicted (FP) false positives by WAP (first two columns) and WAPe (next two columns). WAPe predicted 104 false positives: the same as WAP plus 42 that WAP classified as non-false positives.

WordPress plugins

In both analyses, it detected HI and CS vulnerabilities, while LDAPI and SF were only detected in the web applications (no plugins). All these vulnerabilities have been reported to the developers of the web applications and WP plugins.

Conclusions

Given the sequence of observations, hidden states (one per observation) are discovered by HMM, taking into account the order of the observations. Section 5.4 presents the DEKANT tool that implements the model, and Section 5.5 presents an experimental evaluation.

Intermediate Slice Language

ISL tokens and grammar

The first 20 represent code elements and their parameters, while the last two are specific to the corpus and the implementation of the model (see Sections 5.3 and 5.4). A cut list is the result of applying a set of statement rules (line 2), each of which can be a subrule (line 4-11), a statement (line 12), or an assignment statement (line 13).

Variable map

Slice translation process

The Model

Building the corpus

The instruction $var = $_POST['paramater'], for example, translated into ISL becomes input varand is represented in the corpus ashinput,Tainti hvar_vv,Tainti. For example, the PHP instructions from lines 1 and 2 (Listing 5.2(a)) result in the sequence of line 1 in the corpus.

Sequence model

Transition probabilities: count how many times in the corpus a given state transitions to another state (or to itself). An example is the probability of the Taint state to emit the token thevar_vv – the pair hvar_vv,Tainti.

Detecting vulnerabilities

TL updated: (i) variable name insertion if state is Taint; or (ii) removal if its state is N-Taint and the variable belongs to TL. The final state of theslice-isl (corresponding to line 3) is N-Taint, since it is a variable in CTL.

Implementation and Assessment

Implementation of the DEKANT

Corpus sequence processing (corpus processing step) means separating observations from states, resulting in matrices of observations and states with the same dimension. However, corpus sequences are not of equal length (see for example Figure 5.3), so normalization is necessary.

Model and corpus assessment

The knowledge extracted from this corpus is shown in Figure 5.6, representing the model parameters. Then, the tool is trained with a pseudocorpus of 9 folds and tested with the 10th fold.

Experimental Evaluation

Open source software evaluation

To demonstrate the ability of DEKANT to classify vulnerabilities, we use it with 10 Word-Press plugins (WordPress, 2015) and 10 packages of real web applications, all written in PHP, using the corpus of the previous section . Therefore, in order to run DEKANT with the source code of the plugins, but without the WordPress codebase, we added the information about those functions to the tool.

Comparison with data mining tools

WAP identified the same 258 unremedied slices (columns 2 and 4 of Table 5.5) as the slice extractor and detected the same 206 vulnerabilities as DEKANT (5 fewer than DEKANT, false negatives, FN). We present the experimental results of the tool driven without and with data mining.

Comparison with taint analysis tools

Discussion

Conclusions

Additionally, they can ignore that the data may be uncompared when it is inserted into the DBMS, leading to a second-order SQLI vulnerability. For stored injection, we suggest plugins to handle certain attacks before the data is inserted into the database.

DBMS Injection Attacks

The user admin\'- - does not exist in the database, so this SQLI attack is not successful. In a second-order SQLI attack (class D.1), the inserted data is a string specially crafted to be inserted into a second SQL query executed in the second step.

The SEPTIC Approach

SEPTIC overview
Query structures and query models
Query identifiers
Attack detection
Training
Detection examples
Discussion

The ID format is the SQL command (typically SELECT) followed by the number of nodes in the query structure. For the example of Listing 6.1, which has the query structure in Figure 6.3(b), the ID would beselect_9.

Implementation

Protecting MySQL

These rules call the SEPTIC detector with an input that matches the query parsed and validated by MySQL. This function calls the processSelect_LexeninsertElementTemplate functions to check the query statement (SELECT, DELETE, INSERT, UPDATE) and build the QS.

Inserting identifiers in Zend

Listing 6.2 presents the algorithm to get the query ID implemented by the get_query_ID function. Otherwise, the algorithm checks whether the query belongs to the array of the function arguments.

Inserting identifiers in Spring / Java

Then, if the function is a sensitive sink, we get a query argument to start tracing it back (lines 11 and 12).

Experimental Evaluation

Attack detection

In the experiments, first with SEPTIC turned off, we injected malicious user input created manually to confirm the presence of the vulnerabilities in the code samples. The anti-SQLI tools only found the attack from class A.5 in the semantic mismatch attacks (row 21).

Performance overhead

We run a single Firefox browser in each client machine, but varied the number of these machines from 1 to 4. Finally, it can be seen that the overhead tends to increase with the number of PCs and browsers generating traffic as the load increases.

Extensions to SEPTIC

Protecting other DBMSs

The result of the parsing and validation phases is the same as in MySQL, a list of stacks where each stack of the list represents a clause of the query and each of the nodes contains data about the query element. Again each stack of the list represents a clause of the query (e.g. SELECT, FROM) and nodes are a query element.

Vulnerability diagnosis

Similar to MariaDB, no changes are needed in the generation of IDs implemented in the Zend engine.

Detecting attacks against non-web applications

Conclusions

With SEPTIC we aim to make the DBMS secure, so that model generation and attack detection are done within the DBMS. Like SEPTIC, DIGLOSSIA detects syntax structure and mimicry attacks, but, unlike SEPTIC, it neither detects second-order SQLI as it only computes queries with user inputs, nor encoding and codespace character attacks and evasion as these attacks do not change the root nodes of the previously parsed tree. malicious user inputs are processed by the DBMS.

Future Work

Information flows that exploit web vulnerabilities

Architecture including main modules, and data structures

Example (i) AST, (ii) TST, and (iii) taint analysis

Script with SQLI vulnerability, its TEPT, and untaint data structures

Number of attribute occurrences in the original data set

Number of attribute occurrences in the balanced data set

Overview of the WAP tool modules and data flow

Reorganization of WAP’s code analyzer module

Reorganization of the false positives predictor module

Downloads and active installed plugins of 115 analyzed (blue columns) and

Number of vulnerabilities detected by class in the vulnerable web applica-