55 3.5 Confusion matrix of the 3 best classifiers (first two with original data, third . with a balanced data set). In Proceedings of the 46th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, France, 8 pages, June-July 2016.
Objectives
The generated model can then be used as a static analysis tool to detect and identify vulnerabilities in the source code. The model takes into account the order of code elements within the source code being analyzed, and the different states they can take.
Summary of Contributions
A study of patches that correct source code and remove vulnerabilities without compromising the behavior of web applications. A configuration study for a new data mining component, with a set of attributes and a larger dataset.
Structure of the Thesis
- Query manipulation
- Client-side injection
- File and path injection
- Command injection
This vulnerability can be prevented by checking if the user input contains the following malicious characters. Defense against this vulnerability is possible by using one of three measures: (1) to cast the parameters received from the user to the correct type (Ronet al., 2015); In Listing 2.4, the username will be cast to string by $user = (string)$_POST['user']; to change. line 4); (2) using the mysql_real_escape_stringPHP sanitization function to invalidate the same malicious characters as SQLI, such as the prime; (3) validating the user inputs, checking if they match some of the following characters< > &.
Detection of Vulnerabilities
Static analysis
For example, if a program contains a buffer overflow vulnerability, there is a data flow that starts in an entry point and ends in a function that manipulates a buffer and requires reliable data (eg, the strcpy-sensitive sink). Later, the RIPS authors improved the PHP object-oriented code analysis tool to statically search for PHP object injection (POI) vulnerabilities that can be exploited by property-oriented programming (POP), i.e. the ability of an attacker to change the properties of an object injected for the purpose of exploiting a POI vulnerability.
Fuzzing
Whitebox fuzzers use symbolic execution and constraint solving applied to the source code (Duchèneet al., 2014). This form of whitebox fuzzing is implemented in the SAGE (Godefroid et al. KLEE (Cadaret al., 2008) and DART (Godefroid et al., 2005) fuzzers), using symbolic execution to exercise all possible program execution paths.
Vulnerabilities and Machine Learning
- Machine learning classifiers and data mining
- Sequence models and natural language processing
- Detecting vulnerabilities using machine learning
- Related uses of machine learning
The choice of machine learning algorithm depends on some factors such as the type of problem to be solved and the nature of the dataset (Chandola et al., 2009). SuSi uses machine learning to identify sources and sinks in the Android API source code (Rasthoferet al., 2014).
Removing Vulnerabilities and Runtime Protection
- Removing vulnerabilities
- Runtime protection
- Overview of the approach
- Architecture
AMNESIA creates models by analyzing application source code and extracting query structure. Section 3.1 discusses an approach to automatically detect and remediate this type of vulnerability using failure analysis results and predicting false positives through data mining, and presents the architecture of a WAP tool that implements the approach.
Detecting Candidate Vulnerabilities by Taint Analysis
Each branch of the TEPT corresponds to a compromised variable and contains a subbranch for each line of code where the variable becomes compromised (a square in the image). The procedure is repeated to create the branch and insert the dependency into the subbranch.
Predicting False Positives
- Classification of vulnerabilities
- Classifiers and metrics
- Evaluation of classifiers
- Selection of classifiers
- Final selection and implementation
For each candidate vulnerability, the table shows the values of the attributes (Y or N), and the class, to be assessed manually (supervised machine learning). With the balanced data set, it was one of the best classifiers, despite fpp remaining unchanged.
Fixing and Testing the Source Code
Code correction
For these two classes of vulnerabilities, a fix is inserted for each malicious input that reaches a sensitive sink. For example, if three malicious inputs appear in an echo-sensitive sink (for reflected XSS), then thesan_outfix will be inserted three times (one per each malicious input).
Testing fixed code
Nothing is output for SQLI, reflected XSS, and PHPCI, and application execution continues. For others, where patches perform validation, an alarm is raised when an attack is detected and the execution of the web application is stopped.
Implementation and Challenges
If the analysis is global, it means that the propagation of corruption also propagates when functions in different modules are called. This means that WAP handles multiple TSTs and TEPTs for proper fault propagation.
Experimental Evaluation
- Large scale evaluation
- Taint analysis comparative evaluation
- Full comparative evaluation
- Fixing vulnerabilities
- Testing fixed applications
The confusion matrix of the LR model for PhpMinerII (Table 3.14) shows that it correctly classified 68 cases, with 48 as vulnerabilities and 20 as non-vulnerabilities. The comparison with Pixy can be extracted from table 3.12; however, we cannot show the results of PhpMinerII in the table because it does not really identify vulnerabilities.
Conclusions
The chapter addresses the difficulty of extending these tools by proposing a modular and extensible version of the WAP tool (presented in Chapter 3), which is equipped with weapons (WAP extensions) to detect and exploit new vulnerability classes. to correct. This version of the tool covers eight vulnerability classes: SQLI, XSS (mirrored and stored), remote file integration (RFI), local file integration (LFI), directory or path traversal (DT/PT), OS command injection (OSCI) , source code disclosure (SCD), and PHP command injection (PHPCI).
Restructuring WAP
- Code analyzer
- False positive predictor
- Code corrector
- Weapons
- Effort to modify WAP
The attributes represent symptoms of the same kind, for example the type-checking attribute represents the symptoms that check the data type of variables. The user cleanup template is chosen if the user specifies the malicious characters that can be used to exploit the vulnerability and one that can be used to neutralize them (for example, the backslash).
Extending WAP with weapons
Reusing the sub-modules
Detection of the four vulnerabilities mentioned above can be included in the submodules of section 4.2.1, and patches to remove them can be created using the patch template (section 4.2.3). Regarding LDAPI and XPathI, a patch was created for each using the user authentication patch template.
Creating weapons
They validate the user input content against JavaScript code, so we changed them to also check the input content against URIs/hyperlinks.
Experimental Evaluation
Real web applications
The last four columns of the table show the number of predicted (FPP) and unpredicted (FP) false positives by WAP (first two columns) and WAPe (next two columns). WAPe predicted 104 false positives: the same as WAP plus 42 that WAP classified as non-false positives.
WordPress plugins
In both analyses, it detected HI and CS vulnerabilities, while LDAPI and SF were only detected in the web applications (no plugins). All these vulnerabilities have been reported to the developers of the web applications and WP plugins.
Conclusions
Given the sequence of observations, hidden states (one per observation) are discovered by HMM, taking into account the order of the observations. Section 5.4 presents the DEKANT tool that implements the model, and Section 5.5 presents an experimental evaluation.
Intermediate Slice Language
ISL tokens and grammar
The first 20 represent code elements and their parameters, while the last two are specific to the corpus and the implementation of the model (see Sections 5.3 and 5.4). A cut list is the result of applying a set of statement rules (line 2), each of which can be a subrule (line 4-11), a statement (line 12), or an assignment statement (line 13).
Variable map
Slice translation process
The Model
Building the corpus
The instruction $var = $_POST['paramater'], for example, translated into ISL becomes input varand is represented in the corpus ashinput,Tainti hvar_vv,Tainti. For example, the PHP instructions from lines 1 and 2 (Listing 5.2(a)) result in the sequence of line 1 in the corpus.
Sequence model
Transition probabilities: count how many times in the corpus a given state transitions to another state (or to itself). An example is the probability of the Taint state to emit the token thevar_vv – the pair hvar_vv,Tainti.
Detecting vulnerabilities
TL updated: (i) variable name insertion if state is Taint; or (ii) removal if its state is N-Taint and the variable belongs to TL. The final state of theslice-isl (corresponding to line 3) is N-Taint, since it is a variable in CTL.
Implementation and Assessment
Implementation of the DEKANT
Corpus sequence processing (corpus processing step) means separating observations from states, resulting in matrices of observations and states with the same dimension. However, corpus sequences are not of equal length (see for example Figure 5.3), so normalization is necessary.
Model and corpus assessment
The knowledge extracted from this corpus is shown in Figure 5.6, representing the model parameters. Then, the tool is trained with a pseudocorpus of 9 folds and tested with the 10th fold.
Experimental Evaluation
Open source software evaluation
To demonstrate the ability of DEKANT to classify vulnerabilities, we use it with 10 Word-Press plugins (WordPress, 2015) and 10 packages of real web applications, all written in PHP, using the corpus of the previous section . Therefore, in order to run DEKANT with the source code of the plugins, but without the WordPress codebase, we added the information about those functions to the tool.
Comparison with data mining tools
WAP identified the same 258 unremedied slices (columns 2 and 4 of Table 5.5) as the slice extractor and detected the same 206 vulnerabilities as DEKANT (5 fewer than DEKANT, false negatives, FN). We present the experimental results of the tool driven without and with data mining.
Comparison with taint analysis tools
Discussion
Conclusions
Additionally, they can ignore that the data may be uncompared when it is inserted into the DBMS, leading to a second-order SQLI vulnerability. For stored injection, we suggest plugins to handle certain attacks before the data is inserted into the database.
DBMS Injection Attacks
The user admin\'- - does not exist in the database, so this SQLI attack is not successful. In a second-order SQLI attack (class D.1), the inserted data is a string specially crafted to be inserted into a second SQL query executed in the second step.
The SEPTIC Approach
- SEPTIC overview
- Query structures and query models
- Query identifiers
- Attack detection
- Training
- Detection examples
- Discussion
The ID format is the SQL command (typically SELECT) followed by the number of nodes in the query structure. For the example of Listing 6.1, which has the query structure in Figure 6.3(b), the ID would beselect_9.
Implementation
Protecting MySQL
These rules call the SEPTIC detector with an input that matches the query parsed and validated by MySQL. This function calls the processSelect_LexeninsertElementTemplate functions to check the query statement (SELECT, DELETE, INSERT, UPDATE) and build the QS.
Inserting identifiers in Zend
Listing 6.2 presents the algorithm to get the query ID implemented by the get_query_ID function. Otherwise, the algorithm checks whether the query belongs to the array of the function arguments.
Inserting identifiers in Spring / Java
Then, if the function is a sensitive sink, we get a query argument to start tracing it back (lines 11 and 12).
Experimental Evaluation
Attack detection
In the experiments, first with SEPTIC turned off, we injected malicious user input created manually to confirm the presence of the vulnerabilities in the code samples. The anti-SQLI tools only found the attack from class A.5 in the semantic mismatch attacks (row 21).
Performance overhead
We run a single Firefox browser in each client machine, but varied the number of these machines from 1 to 4. Finally, it can be seen that the overhead tends to increase with the number of PCs and browsers generating traffic as the load increases.
Extensions to SEPTIC
Protecting other DBMSs
The result of the parsing and validation phases is the same as in MySQL, a list of stacks where each stack of the list represents a clause of the query and each of the nodes contains data about the query element. Again each stack of the list represents a clause of the query (e.g. SELECT, FROM) and nodes are a query element.
Vulnerability diagnosis
Similar to MariaDB, no changes are needed in the generation of IDs implemented in the Zend engine.
Detecting attacks against non-web applications
Conclusions
With SEPTIC we aim to make the DBMS secure, so that model generation and attack detection are done within the DBMS. Like SEPTIC, DIGLOSSIA detects syntax structure and mimicry attacks, but, unlike SEPTIC, it neither detects second-order SQLI as it only computes queries with user inputs, nor encoding and codespace character attacks and evasion as these attacks do not change the root nodes of the previously parsed tree. malicious user inputs are processed by the DBMS.
Future Work
Information flows that exploit web vulnerabilities
Architecture including main modules, and data structures
Example (i) AST, (ii) TST, and (iii) taint analysis
Script with SQLI vulnerability, its TEPT, and untaint data structures
Number of attribute occurrences in the original data set
Number of attribute occurrences in the balanced data set
Overview of the WAP tool modules and data flow
Reorganization of WAP’s code analyzer module
Reorganization of the false positives predictor module
Downloads and active installed plugins of 115 analyzed (blue columns) and
Number of vulnerabilities detected by class in the vulnerable web applica-
Overview on the proposed approach
Code vulnerable to SQLI, translation into ISL, and detection of the vulnera-
Code with a slice vulnerable to XSS (lines {1, 2, 4}) and a slice not vulner-
Model graph of the proposed HMM
Models for two example corpus sequences
Parameters of the model extracted from the corpus. The columns represent