Security Analysis of the Java Library with Mock Objects

One common exploit scenario consists of an attacker initiating sensitive platform operations, allowing them to obtain sensitive data from the results of those operations. We present a static analysis that, given the resources of a subset of the Java class library's public API, computes the overestimation of parts of the platform that could leak sensitive information if triggered by an attacker. The analysis is based on the Doop framework, which uses the Datalog language to declaratively specify algorithms for index analysis.

Yannis Smaragdakis for suggesting the idea of my dissertation and giving me the opportunity to work on such an interesting topic. I would also like to thank both of my supervisors, Yannis Smaragdakis and George Kastrinis, as well as postdoctoral researcher Neville Grech and Ph.D. This thesis aims to investigate how mock objects can impact declarative static program analyzes specified using the Doop framework.

It was developed as my bachelor's thesis between March 2016 and October 2016 at the Department of Informatics and Telecommunications at the University of Athens.

INTRODUCTION

BACKGROUND

Points-To Analysis in Datalog

Calculation in Datalog consists of monotonic logical inferences that apply repeatedly to produce more facts until a fixed point is reached. The simple Datalog program of Figure 2.2 consists of two rules, known in Datalog semantics as IDB (Intensional Database) rules, and are used to establish facts based on a combination of already established facts. In LB-Datalog syntax, the header of an inference line (i.e., the inferred fact) is separated from the body of the line (i.e., the previously established facts) by the left arrow.

For example, the above first line is the base case of the calculation and states that, when assigning an assigned heap object to a variable, this variable can point to that heap object. The second rule uses recursion to say that a variable can point to any heap object pointed to by another variable, if the value of the second variable is assigned to the first.

Context Sensitivity in Doop

For example, in our example, a call-site-sensitive analysis will create two separate point-to-sets for the variables in the method bar, one for each call on lines 7 and 8. On the other hand, object-sensitive analyzes qualify contexts using the assignment location of the receiver object (i.e. the “this” object of the method). In this way, the context of two method calls can be different even if they share the same call location, due to different allocation locations of the receiver objects.

In the example above, an object-sensitive analysis will differentiate the calls to bar depending on the allocation location of the objects that variablesa1and2 can refer to. Type sensitivity in Baptism is analogous to object sensitivity, but types and not allocation sites are used to qualify contexts. The purpose of type sensitivity is to provide a more scalable analysis without sacrificing too much accuracy.

A detailed description of the context-insensitive and context-sensitive parsing model in Doop can be found in [2]. The main difference in adding context sensitivity is the use of constructors, also known as skolem functions [16]. These functions are black boxes for the rest of the analysis and are used when we need to create a new calling context (or simply Context) for a variable abstraction or a new heap context (or simply HContext) for a heap abstraction.

Different flavors of context sensitivity are implemented by specifying variations on the record and merge functions. Importantly, the addition of constructors by the LB-Datalog engine makes the language Turing complete, that is, Doop's context constructors are recursive: they return the same type of entities they take as input, invalidating the property of polynomial execution is becoming.

To restore this property, we restrict our attention to the definitions of registration and joins that create contexts in domains isomorphic to finite sets, polynomially bounded by the size of the input.

SECURITY ANALYSIS

Analysis Sources
Mock Objects

Mock Objects and Fields

Analysis Sinks

Sanitization

Reflection
Leaks
String Operations

Clustering dummy objects can have many variations, depending on how we want to handle the trade-off between scalability and precision of analysis. The EDB rules responsible for creating dummy objects are presented in Figure 3.3. The first rule creates one object for each type of JCL except infected.

The second creates an object per parameter type, if the type is tainted (Figure 3.2), for each method in the JCL. The reason the second rule does not focus only on the source methods is that at this stage of the analysis (ie generating the input facts) they have not been calculated yet. The created mock objects (ie, those stored in the MockHeap and TaintedHeap predicates) must be assigned to the parameters of the source methods as replacements for attacker-created objects.

As for the rest of the parameters (those of non-dirty type) and these variables, we choose to assign them simple mock objects (ie those created one per type). In our analysis we follow the second approach since we want to avoid handling any public JCL methods. Furthermore, the abstract objects assigned to mock object fields and the mock objects themselves belong to the same group (that is, the mock objects for the type).

Consider e.g. a Node object (Figure 3.5), which represents a node in a binary tree, but its child1 and child2 fields point to itself. Thanks to Doop's point-to-analysis, which calculates how heap objects flow intra- and inter-procedurally through the program, we only need to define two rules (Figure 3.8) to find tainted heaps that reach drains. The parser computes the SanitizedHeapFromSourceFlowsToSink relation, which is a subset of the TaintedHeapFromSourceFlowsToSinkrelation (Figure 3.8), since it only stores the sources of desanitizedtained heaps that reach a sink.

Extending our analysis means adding two new input relationships to the EDB, ForNameHeap and NewInstanceHeap, shown in Figure 3.10. These two relations store one abstract object of type java.lang.Class and one of type java.lang.Object for each method of the JCL, respectively. In the final step, our analysis calculates which reflective objects, coming from a sink, can leak through the API's public functions.

To achieve this, we only need to define a handful of rules (Figure 3.12), since Doop's point-to-analysis takes care of how objects flow intra- and inter-procedurally through the program. The StringFactoryVarPointsTo relation is a subset of the VarPointsTo relation that contains only those variables that are of a string factory type.

Figure 3.6: Datalog code for assigning mock objects to mock object’s fields

EXPERIMENTAL RESULTS

The increase in analysis time (due to additional calculations) and washes achieved accompanying the addition of mock objects shows the positive contribution of mock objects towards a more complete analysis. We are interested in the following metrics: inputs VarPointsTo (vars), TaintedHeapFromSourceFlowsToSink (sinks), LeakClassObject (leak-co), LeakHeap (flow-h), and SourceToSinkToLeak (leak-path). Regarding sensitive information leakage, the insens + terrain analysis reports 15 different class objects derived from 13,322 public methods.

Clearly, an insensitive analysis is not a good fit for our problem, as a large number of false positives are due to its lack of accuracy. As expected, the 2type+h+field analysis proves more precise reporting of five distinct class objects leaking from 241 public methods. In both cases, after inspecting the results, it is almost certain that the reported vulnerabilities are false alerts.

However, many of these are to be expected, as our analysis is optimistic in some cases and we do not implement any logic for the JCL's more advanced security mechanisms. The two exploits described in the Common Vulnerabilities and Exposures Directory under IDs 2012-4681 and 2013-0422 have been patched in JRE 7u45 and our analyzes manage to report no false positives about them. Our analyzes can report the leakage of restricted class objects in both cases.

It appears that the 2type+h analysis for this JRE version is less accurate as it reports more leaks. However, this occurs mainly due to the inaccuracy of the insensitive analysis, which reports that many flows to wells are classified as treated when this should not be the case. Apparently the sanitization method we defined is not widely used in this JRE version, which uses other techniques to ensure authorized access to classes of restricted packages.

Table 4.1 presents more metrics that back up our claims. We are interested in the following metrics: VarPointsTo entries (vars), TaintedHeapFromSourceFlowsToSink (sinks), LeakClassObject (leak-co), LeakHeap (leak-h) and SourceToSinkToLeak (leak-path).

CONCLUSIONS

ANALYSIS INPUT RELATIONS

SECURITY ANALYSIS CODE

6 MethodSignature:Value(?sig:""). In OOPSLA '09: 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, New York, NY, USA, 2009. Martin, Dzintars Avots, Michael Carbin, and Christopher Unkel, “Context-Sensitive Programming Analysis as Queries by database”, In PODS '05: Proc.

11] Michael Eichberg, Sven Kloppenburg, Karl Klose, and Mira Mezini, “Definition and continuous control of structural program dependencies”, In ICSE ’08: Proc.