0% found this document useful (0 votes)
11 views9 pages

Static Analysis Framework For Detecting SQL Injection Vulnerabilities

This paper presents SAFELI, a static analysis framework designed to detect SQL Injection Attacks (SIA) at compile time by inspecting MSIL bytecode of ASP.NET applications using symbolic execution. SAFELI employs a hybrid constraint solver to identify potential vulnerabilities and generate user inputs that could exploit these weaknesses. The framework aims to provide a more robust solution compared to traditional black-box testing methods, which may fail to uncover complex vulnerabilities.

Uploaded by

say.mansabdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

Static Analysis Framework For Detecting SQL Injection Vulnerabilities

This paper presents SAFELI, a static analysis framework designed to detect SQL Injection Attacks (SIA) at compile time by inspecting MSIL bytecode of ASP.NET applications using symbolic execution. SAFELI employs a hybrid constraint solver to identify potential vulnerabilities and generate user inputs that could exploit these weaknesses. The framework aims to provide a more robust solution compared to traditional black-box testing methods, which may fail to uncover complex vulnerabilities.

Uploaded by

say.mansabdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4269529

A Static Analysis Framework For Detecting SQL Injection Vulnerabilities

Conference Paper · August 2007


DOI: 10.1109/COMPSAC.2007.43 · Source: IEEE Xplore

CITATIONS READS
150 1,185

6 authors, including:

Kai Qian Lixin Tao


Kennesaw State University Pace University
178 PUBLICATIONS 1,587 CITATIONS 47 PUBLICATIONS 602 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Lixin Tao on 05 June 2014.

The user has requested enhancement of the downloaded file.


A Static Analysis Framework For Detecting SQL Injection Vulnerabilities

Xiang Fu Xin Lu Boris Peltsverger Shijun Chen


School of Computer and Information Sciences
Georgia Southwestern State University, Americus, GA 31709
Kai Qian
School of Computing and Software Engineering
Southern Polytechnic State University, Marietta, GA 30060
Lixin Tao
Computer Science Department
Pace University, Pleasantville, NY 10570

Abstract Example 1.1 A log-in page has two text box fields for en-
tering user name and password. Let sUname and sPwd
Recently SQL Injection Attack (SIA) has become a major represent the strings contained in text boxes. Presented be-
threat to Web applications. Via carefully crafted user input, low is a piece of back-end C# code that constructs SQL
attackers can expose or manipulate the back-end database statement on the fly. Here “+” denotes string concatenation.
of a Web application. This paper proposes the construc-
"SELECT uname, pass FROM users WHERE \n uname=’"
tion and outlines the design of a static analysis framework + sUname + "’ AND pass=’" + sPwd + "’"
(called SAFELI) for identifying SIA vulnerabilities at com-
pile time. SAFELI statically inspects MSIL bytecode of an The above SQL query intends to verify existence of the
ASP.NET Web application, using symbolic execution. At user name/password pair supplied. However, if attacker
each hotspot that submits SQL query, a hybrid constraint enters “admin’ -- ” as user name, and leaves password
solver is used to find out the corresponding user input that blank, the following SQL statement is constructed.
could lead to breach of information security. Once com-
pleted, SAFELI has the future potential to discover more SELECT uname,pass FROM users WHERE
uname=’admin’ -- ’ AND pass=’’
delicate SQL injection attacks than black-box Web security
inspection tools.
Since “--” comments out the rest of query, it only veri-
keywords: SQL Injection Attack, Symbolic Execution, fies existence of user name “admin”. The attacker can log
Constraint Solver, Automatic Testing. in without supplying password as long as admin account
exists.

1 Introduction One typical approach against SIA is to filter out special


characters such as single quote and “--”. However, it does
Web applications have been very successful in E- not work for more delicate variations, e.g., the attack against
Commerce for almost two decades. However, many Web integer columns in database [1]. Recently, many solutions
applications today suffer from SQL Injection Attack (SIA) are proposed to capture and defend SIA.
[1, 2], when they construct SQL queries on the fly based
on user input. Attackers may be able to trick the back-end • Tainted Data Tracking: The main idea [15] is to track
database of a Web application into executing malicious SQL the data that comes from user input. This can be done
code. This allows to expose, manipulate, and destroy the via instrumenting the run time environment or inter-
back-end database, hence causing great losses. Since SIA preter of the back-end scripting language. When an
is conducted at application level, normal firewall and intru- SQL statement is submitted, its syntax tree is first ex-
sion detection at network layer have no defense against SIA. amined. if any of its SQL keywords is identified to be
Consider the following well-known example [1]. from user input, the SQL statement is stopped.

1
• Intrusion Detection Based on Static Analysis: Using 2 Motivating Example
static string analysis technique [6], it is possible to con-
struct a regular expression that conservatively approxi- This section describes a non-trivial SIA vulnerability
mates the set of SQL statements generated at a hotspot that can not be easily detected by black-box testing tools.
(which submits SQL query). The information can be It is motivated by the vulnerability example of string size
used to statically analyze syntax correctness of SQL restriction in [1]. Fig. 1 presents a massage() method that
statements [8] and to model “normal behaviors” of a processes a user input string in two steps. First, it checks
Web application. During run time, any SQL statement whether the input string contains suspicious SQL keywords
not contained in the approximation library will be re- such as “--”, “OR”, and “drop”. Then it substitutes each
jected by intrusion detection [9]. single quote character with “’’”, i.e., escape character of
single quote in SQL. Finally, massage() tries to provide
• Black-box Testing: Black-box testing [10, 7] can be further protection by restricting the length of the output
used to discover SIA vulnerabilities, by applying a li- within 16.
brary of pre-collected attack patterns. It is fast and ef-
String massage(String strInput)
fective, however, without prior knowledge of source {
code, it has difficulty in discovering non-trivial vulner- //1. SQL keyword search
if(strInput.IndexOf("--")!=-1
abilities. || strInput.IndexOf("OR")!=-1
|| strInput.IndexOf("drop")!=-1)
• SQL Randomization: As an extension of instruction throw new Exception(
"Possible SQL Injection Attack: " + strInput
randomization [11], SQL randomization [3] instru- );
ments a Web application and appends a random num-
//2. massage the data for single quote
ber after each SQL keyword used to build SQL state- String sOut = strInput.Replace("’","’’");
ments. SQL parser is modified correspondingly to ac- sOut = sOut.Substring(0,16);
return sOut;
cept randomized SQL keywords. At run-time, since }
a user injected SQL keyword does not have random
number appended, SIA fails due to syntax error.
Figure 1. String Massage
This paper proposes the construction of a static analysis
framework (called SAFELI) for discovering SIA vulnera- The SQL generation statement in Example 1.1 can be
bilities at compile-time. While the tool is still under devel- strengthened as below.
opment, it is beneficial to share its main design idea in this "SELECT uname, pass FROM users WHERE \n uname=’"
paper. The contributions of this paper are listed below: +massage(sUname)+ "’ AND pass=’" +massage(sPwd)+ "’"

1. White-box Static Analysis: SAFELI analyzes byte- Readers can verify that the above code can defend the
code. It relies on string analysis (similar to [9]). How- attack strings in Example 1.1 because “--” is filtered out.
ever, it discovers vulnerabilities at compile-time, but Further more, single quote characters will be replaced by
not at run-time. Once implemented, SAFELI is able to “’’” and hence causing no harm. The massage() function
generate user inputs as hard-evidence of a vulnerabil- in Fig. 1, however, has a very delicate bug. Consider the
ity. following input for user name and password, respectively:
123456789012345’
2. Hybrid-Constraint Solver: SAFELI employs a brand OR uname<>’
new string analysis technique (other than the one used
in [6, 9, 8]). The technique can handle hybrid con- Notice both strings are 16 characters long. After going
straints that involve boolean, integer, and string vari- through the massage operations, the following SQL state-
ables. Most popular string operations can be handled. ment is generated.

This paper is organized as follows. Section 2 presents a SELECT uname,pass FROM users WHERE
uname=’123456789012345’’ AND pass=’ OR uname<>’’
motivating example to be used throughout the paper. Sec-
tion 3 introduces the general structure of SAFELI. Section Notice that its WHERE clause is always a tautology. It
4 discusses the symbolic execution framework. Section 5 consists of two conditions: (1) whether uname is equal to
presents the core technique, i.e., hybrid constraint solver. constant string “123456789012345’’ AND pass=” (note
Section 6 covers test case generation and applies hybrid the escape character “’’” inside), or (2) uname is not an
constraint solver algorithm on the motivating example. Sec- empty string. Obviously “uname<>’’” always evaluates
tion 7 concludes the paper and proposes the future work. to true. The trick is that the length of malicious strings

2
are both 16. Although the Replace function generates es- 4.1 Background Information
cape characters “’’”, half of the “’’” is then cut off by the
Substring method at the end of massage(). Such deli- The history of symbolic execution, which symbolically
cate bugs cannot be easily discovered by black-box testing interprets and verifies correctness of sequential programs,
tools. can be dated back to 1970’s [14]. During symbolic execu-
tion, initial values of input variables are represented using
3 SAFELI Framework symbolic constraints. Each branch in the program is tagged
with a corresponding path condition. When an exception is
This section presents the overall structure of SAFELI encountered or some system safety property is violated, the
(Static Analysis Framework for discovEring sqL Injec- path condition is sent to a constraint solver for generating
tion vulnerabilities), which is currently under development. the corresponding initial values of input variables. Sym-
SAFELI consists of the following components: bolic execution has been widely applied in automatic test
case generation [17], discovery of Operating System vul-
• MSIL Instrumentor: The module instruments MSIL nerabilities [18], and combination of model checking to an-
bytecode of an ASP.NET application for symbolic ex- alyze heap configurations and data structures [13].
ecution. It inserts additional monitoring function at We briefly introduce the idea of symbolic execu-
each location where data members of objects are ac- tion using one example presented in Fig. 2. Function
cessed. Values of (uninitialized) variables are replaced PointInRectangle() checks if a point (x, y) is con-
with symbolic constraints. Each hotspot, i.e., the loca- tained in a rectangle whose left top vertex is (x1, y1) and
tion which submits SQL statement, is tagged to trigger whose width and height are w1 and h1 respectively. Func-
constraint solver. tion hasCollision() tests whether two rectangles collide
• Symbolic Execution Engine: It is essentially a wrapper with each other by examining whether any of the four ver-
of the .Net Framework. The execution engine itera- tices of the first rectangle is contained in the second rectan-
tively examines the back-end code for each ASP page gle. We verify the correctness of hasCollision() in the
main() function by calling it twice and swapping the se-
one by one. When hotspots are reached, a library of
pre-set attack patterns is consulted, based on which quence of two rectangles in its input parameters. We expect
main() function returns without any exception.
a hybrid string constraint is constructed and sent to
constraint solver for generating vulnerability evidence Symbolic execution starts at line 19, where all eight in-
(user input). teger variables are assigned a symbolic value, and let them
be a1 , b1 , c1 , d1 , a2 , b2 , c2 , d2 . Then symbolic execution
• Library of Attack Patterns: The module stores a col- traces to line 21 and steps into hasCollision() where
lection of pre-set attack patterns, each of which is rep-
resented using a regular expression. 1: static bool PointInRectangle(int x, int y,
2: int x1, int y1, int w1, int h1)
• Constraint Solver: Given a constraint, the solver tests 3: {
its satisfiability and generates valuation of variables 4: return (x>=x1 && y>=y1 && x<=x1+w1 && y<=y1+h1);
5: }
that satisfy the constraint. Different than other pop- 6:
ular platforms of symbolic execution, the Constraint 7: static bool hasCollision(int x1,int y1,int w1,int h1,
8: int x2, int y2, int w2, int h2)
Solver of SAFELI can solve string constraints. Details 9: {
are discussed in Section 5. 10: bool b1 = PointInRectangle(x1,y1,x2,y2,w2,h2);
11: bool b2 = PointInRectangle(x1+w1,y1,x2,y2,w2,h2);
12: bool b3 = PointInRectangle(x1,y1+h1,x2,y2,w2,h2);
• Test Case Generator: When initial valuations are gen- 13: bool b4 = PointInRectangle(x1+w1,y1+h1,x2,y2,w2,h2);
erated, they are passed to the Test Case Generator. 14: return (b1 || b2 || b3 || b4);
15: }
The module then injects the values into HTML fields 16:
and posts the web page back to server. It then uses 17: static void main(string [] args)
18: {
a heuristic algorithm to analyze the response from 19: int x1,y1,w1,h1; //uninitialized, or init by
server. When vulnerability is verified, a step by step 20: int x2,y2,w2,h2; //e.g., x1=int.Parse(args[0])
21: bool b11 = hasCollision(x1,y1,w1,h1,x2,y2,w2,h2);
error trace is generated. 22: bool b21 = hasCollision(x2,y2,w2,h2,x1,y1,w1,h1);
23: if(b11==b21)
24: return;
4 Symbolic Execution 25: else
26: throw new Exception("hasCollision() incorrect!");
27: }
This section presents the Symbolic Execution Engine of
SAFELI. After a brief background introduction, we discuss
the MSIL instrumentor of SAFELI. Figure 2. Collision Detection

3
String sqlstat = "";
four variables b1, b2, b3, and b4 are added into variable if(a<2){
collection. At line 10, b1 is associated with the following sqlstat = "SELECT * FROM users WHERE uname=’"
+ txtboxUname.Text + "’";
symbolic constraint, letting it be B1 : }else{
sqlstat = "SELECT * FROM users WHERE uname=’guest’";
a1 ≥ a2 ∧ b1 ≥ b2 ∧ a1 ≤ a2 + c2 ∧ b1 ≤ b2 + d2 . }

Similarly, b2, b3, and b4 can be associated with symbolic


Figure 4. Sample Before Instrumentation
constraints and let them be B2 , B3 , and B4 . When sym-
bolic execution returns to line 21, b11 is associated with
B1 ∨ B2 ∨ B3 ∨ B4 . b12 is similarly assigned, and
call of “SqlCommand.ExecuteDataset()”, an additional
let it be B1′ ∨ B2′ ∨ B3′ ∨ B4′ . When the branch state-
“attack test()” function is called to iteratively test each
ment (line 23) is encountered, both branches are associ-
malicious attacking pattern and find out initial values of in-
ated with a path condition. If the second branch is taken,
put variables.
the exception triggers the string solver to solve path condi-
tion B1 ∨ B2 ∨ B3 ∨ B4 6= B1′ ∨ B2′ ∨ B3′ ∨ B4′ . As shown in Fig. 3, in SAFELI a symbolic constraint is
An integer solver like Omega Library [12] can easily de- represented by a class called Constraint. A constraint
cide that the above constraint is satisfiable. For example, is composed of basic elements ranging from integer ex-
one concrete valuation for (x1, y1, w1, h1, x2, y2, w2, h2) pressions, string expressions, and hybrid expressions such
is (1, 1, 1, 1, 0, 0, 3, 3). as sqlStatement.Length<5. Thus the Expr class has
two derived classes: IntExpr (i.e., integer expression) and
StrExpr (i.e., string expression). IntExpr supports in-
4.2 Instrumentation
teger operations such as addition, subtraction, multiplica-
tion with constants, etc. Frequently seen string operations
The instrumentor of SAFELI is still under development.
such as Substring, Replace, IndexOf, and CharAt are
This subsection describes its major design idea. Before
supported by StrExpr. To resolve pure integer constraints,
going through Symbolic Execution Engine, MSIL code of
SAFELI relies on the Omega library [12]. To resolve pure
an ASP.NET Application has to be instrumented. SAFELI
and hybrid string constraints, we use a unique string solver,
relies on RAIL [5] for manipulating MSIL code, because
which is presented in Section 5. The SymbolicState class
RAIL provides the capability of replacing types, refer-
embodies information of a symbolic state: a list of vari-
ences, variables, and methods. Based on RAIL, the job of
ables, current location, a hash table which maps from vari-
bytecode instrumentor is to inspect MSIL code, locate the
ables to symbolic expression, and a path condition associ-
get and set operations on each attribute/property and vari-
ated with the current execution path.
able, and replace each variable access with a correspond-
ing operation that constructs symbolic constraints. Be- We illustrate the idea and the expected effects of SAFELI
fore each hotspot that issues an SQL statement, e.g., the bytecode instrumentor (currently under development) by an
example in Fig. 4. The code snippet in Fig. 4 dynamically
generates an SQL statement based on the value of an inte-
class Constraint{
public RelationalOp relOp; ger variable “a”. After instrumentation, the resulting instru-
public virtual void resolve(); mented code is displayed in Fig. 5.
public Object [] childNodes;
}
class Expr{...}
class BoolExpr: Expr{...} 1 static SymbolicState ss;
class IntExpr: Expr{ 2 ...
public IntExpr addWith(IntExpr expr); 3 ss.Map[sqlstat] = new StrExpr("");
public IntExpr multiplyWith(IntExpr expr); 4 if(ss.random_choice()){
... 5 ss.PathCond.AndWith(new Constraint(RelOp.LessThan,
} 6 new Object[] {ss.Map[a], new IntExpr(2)}));
class StrExpr: Expr{ 7 ss.Map[sqlstat] = (new StrExpr("SELECT ... uname=’"))
public StrExpr Substring(int startIdx, int length); 8 .append(ss.Map[txtboxUname.txt])
public StrExpr Replace(String sOld, String sNew); 9 .append("’");
... 10 }else{
} 11 ss.PathCond.AndWith(
class SymbolicState{ 12 (new Constraint(RelOp.LessThan,
ArrayList lstVars; 13 new Object[] {ss.Map[a], new IntExpr(2)})
int Loc; 14 ).Inverse()
HashTable Map; //mapping from var to expr 15 );
Constraint PathCond; 16 ss.Map[sqlstat] = new StrExpr("SELECT ... guest’");
} 17 }

Figure 3. Classes Used In Instrumentation Figure 5. Sample After Instrumentation

4
In Fig. 5, at the “global” level, there is one instance tive Normal Form (DNF), as shown below:
of SymbolicState, named ss. Here “global” refers _
to the scope of the C# class that corresponds to the Ii ∧ Bi ∧ Si .
i
ASP page it serves. Each assignment of a variable is
replaced by the statement that constructs an expression In the above formula, Ii , Bi and Si represent the integer,
and updates the mapping of the symbolic state. For boolean, and string expression in the i′ th conjunction. No-
example, the “sqlstat=""” statement is translated into tice that the set of variables appeared in Ii and Bi are mutu-
“ss.Map[sqlstat]=new StrExpr("");”. ally exclusive, however, integer variables in Ii could appear
The condition in each if statement (and similarly while in Si . For example, consider the following constraint (let it
loop) is replaced by a random choice() function which be S1 ):
non-deterministically generates a boolean value. At the be-
i < 5 ∧ j > 2 ∧ str1 = str2.Substring(i, j)
ginning of each branch, the current path condition is up-
dated correspondingly. For example, at line 5 of Fig. 5, In this case, we have to solve integer and boolean con-
i.e., beginning of the “true” branch, the path condition is straints first, and then concretize the solution and substitute
conjuncted with the constraint that corresponds to the if- the appearance of any variables in Si with the correspond-
condition “a<2”. ing concretized values. We can generate multiple sets of
The design of SAFELI bytecode instrumentor and sym- solutions, and spawn multiple instances of the string con-
bolic execution engine generally follows the idea of S. straint. For example, the following are part of the con-
Khurshid et al.’s work [13] on Java bytecode. Notice that in straints generated for S1 above. However, note that this
SAFELI we need to execute code multiple times to achieve approach can lead to false negative.
the complete branch coverage, e.g., to execute the code in (i = 1 ∧ j = 3) ∧ str1 = str2.Substring(1, 3))
Fig. 5 twice guarantees a 50% coverage rate, and to execute
∨ (i = 2 ∧ j = 4) ∧ str1 = str2.Substring(2, 4))
it three times guarantees 75%. This naive instrumentation
algorithm can be further improved.
5.2 Backward String Image Computation
5 Constraint Solver
Backward image computation is the key to solving string
constraints. It is an essential concept in symbolic model
This section outlines the algorithm of constraint solver checking. In the context of string manipulation, we define
module (currently under development). It has two respon- backward image as follows: given a set of strings R and
sibilities: (1) to decide satisfiability of path constraints, and a string operation f (e.g., Substring and CharAt), the
(2) to find out the initial values of input variables that lead backward image of R w.r.t. f is the maximal set of strings
to the breach of database security. Notice that our algo- X where for each string s ∈ X : f (s) ∈ R. In SAFELI,
rithm is conservative w.r.t. satisfiability – it can report false both R and its backward image are expressed using regular
negatives (i.e., a satisfiable constraint is reported as non- expression. We now briefly describe the image computation
satisfiable) but will not report false positives. Its implica- algorithm for popular string operations.
tions are as follows: (1) If SAFELI reports an error, the
concretized input variable values will eventually lead to the • string length: The function returns the length of a
error if the program is executed as many times as possi- string. Given the following equation
ble; and (2) If SAFELI does not report an error, there might
still be vulnerability in the program because the constraint s.Length = k,
solver might have false negative reports.
where k is an integer constant, the solution of s is Σk ,
5.1 Hybrid Constraint Solver where Σ is the alphabet. Similarly, given an inequality
s.Length < k the solution of s is Σ0 | Σ1 | ... | Σk−1 .
The design idea of hybrid constraint solver is very sim- Given an inequality s.Length >= k the solution of s
ilar to that of the Action Language Verifier [19]. There is Σk Σ∗ .
are three categories of expressions in a hybrid constraint:
boolean expression, integer expression, and string expres- • substring: The Substring(a, b) operation chops one
sion. Each type is represented symbolically. Boolean ex- substring from source string, starting at index a, with
pressions are represented using BDD [16, 4], integer ex- length b. Here a and b are two integer constants. Given
pressions are represented using Presburger constraints [12], a regular expression r and the equation as below
and string expressions are represented using regular expres-
sion. A hybrid constraint is always represented in Disjunc- s.Substring(a, b) = r,

5
the solution of s is Σa (r ∩ Σb )Σ∗ . Here ∩ repre- 1 1 1,1 8,10
Σ ’ u u ’
sents the intersection of regular languages. It is obvi- 2 2 1,2 7,8
_
ous that the solution of s is also regular because regular n n =
language is closed under concatenation, intersection, 3 3 1,3 6,8
p a a d
union, complementation, difference, and substitution. 4 4 1,4 5,8
w m m w
• charat: The CharAt(a) function returns the character 5 5 1,5 4,8
at index a. Given the equation d e
9
e Σ_ p
6 6 1,6 3,8
’ ’ _
s.CharAt(i) = r,
= = Σ_ =
7 7 ’ 8 1,7 ’ 1,8 2,8
’ ’ ’ ’ ’
the solution of s is Σi (r ∩ Σ1 )Σ∗ . 8 10

1,9

• replace (character): The Replace(a, b) function re- (a) (b) (c)

places every occurrence of a with b in a string, where


a and b are two characters. Given the equation Figure 6. Solving String Equation

s.Replace(a, b) = r, Example 5.1 Let Σ represent the ASCII alphabet. Σ− is


defined as Σ− = Σ − {′ } and Σ+ is defined as Σ+ = Σ− ∪
the solution of s is s = rb/(a|b) where b/(a|b) means to
{′′ }. In another word, Σ− excludes single quote, which is
replace every appearance of b with (a|b). For example,
replaced by its SQL escape character in Σ+ . Consider the
if s.Replace(’a’,’b’) = b∗ c+ then s = (a|b)∗ c+ .
following equation, where ≡ is used to separate the left and
• replace (string): Given s.Replace(s1 , s2 ) = r, right hands of the equation.
where s1 and s2 are two constant strings. The solu- “uname=’” + s + “’ pwd=’” ≡ uname=’Σ+ ∗ ’
tion of s is computed as follows: construct a finite state
machine A that accepts r. Determinize A and let it be Its intuition is essentially to ask: if we concatenate the
A′ . For each state pair c1 , c2 in A′ such that there is a two constant strings (which are intended to test two data
path which could produce s2 , add another path (states columns uname and pwd) with s, is it possible to generate
and transitions) from c1 to c2 such that the string along one single condition that tests on column uname only? To
the newly added path is s1 . The modified automata solve the above equation, we can first solve the following:
accepts the solution of s. s′ + “’ pwd=’” ≡ uname=’Σ+ ∗ ’ (1)
• string concatenation: Given r = s1 + s2 , where s1 Then we can get s from:
and s2 are unknown, solution of s1 is generated as fol- “uname=’” + s ≡ s′ (2)
lows: convert r to a finite state machine, now mark
every state as a final state. The new automaton accepts To solve Equation (1), we need to compute the inter-
the prefix language of r, i.e., s1 . section of two strings: “Σ∗ ’ pwd=’” (let it be sa ) and
“uname=’Σ+ ∗ ’” (let it be sb ). The finite state automata
s2 can be solved as follows: given the automaton A
accepting sa and sb are presented in Fig. 6 (a) and (b). Note
that accepts r, construct another automaton A′ from A
that in Fig. 6, “ ” indicates space character. The automaton
such that A′ accepts the reverse of r (i.e., every word
accepting sa ∧ sb is displayed in Fig. 6(c). Then we study
accepted by A′ has its reverse in r). This can be sim-
each state in Fig. 6(c), starting from which there is path of
ply achieved by making each final state in A an initial
“’ pwd=’” leading to the final state. Readers can verify
state in A′ and the initial state in A the final state in
that (1, 9) and (1, 7) are the only states that satisfy the con-
A′ and then reversing the direction of all transitions.
dition. Then we mark (1, 9) and (1, 7) as final states, and
Similarly, construct the prefix automaton from A′ , and
unmark the original final state in Fig. 6(c). The resulting au-
let it be A′′ . Construct the reverse of A′′ and that is the
tomaton accepts the solution of s′ , which is expressed using
finite state machine accepting s2 .
a regular expression
Note that the above are “maximal” solutions of s1 and
s2 . In practice, we often have to solve an equation like uname= | uname=’(’’|Σ− )∗ ’
r = s + c where r is a regular expression and c is a Similarly we can solve Equation (2), and the regular expres-
constant string. Solving such equations is very use- sion solution of s is displayed below:
ful in generating attack strings. Consider Example 5.1,
(’’|Σ− )∗ ’
which is a simplified version of the motivating exam-
ple in Section 2.

6
SQL-STATMT
In the above equations, s1 and s2 are the massaged
SELECT-STMT
strings of the user name and password. Note that “ ∗ ” on the
right of Equation 5 represents a sequence of white spaces.
SELECT COL_LIST FROM WHERE Using a similar technique in Example 5.1, we can get the
solution of s1 as follows:
uname pwd users EXPR
(’’|Σ− )∗ ’
AND

= = Replace s1 with its solution in Equation 5, we solve s2 and


its solution is presented as below where “ ” stands for white
uname ’ abc ’ pwd ’ bca ’ space.

OR uname<>’

OR With s1 and s2 , we proceed to solve Equations 3 and 4.


= <> Using the algorithm to solve substitution and substring, we
can easily get the solution of sUname as follows1 :
uname ‘ Σ* ’ uname ’ ’

(’’|Σ− )15 ’Σ∗

Figure 7. Solving Motivating Example The solution of sPwd is expressed using a regular expres-
sion as below:
6. Test Case Generation OR uname<>’Σ∗

where there are 5 spaces before “OR”.


We now describe the idea of attack pattern library and Concretize the variables sUname and sPwd we can gen-
test case generator which are currently under development. erate the evidence of vulnerability, as given in Section 2.
When symbolic execution reaches each hotspot, the test
case generator randomly generates some commonly used
strings (not malicious) and instantiates the dynamically
constructed SQL statement. An abstract syntax tree is
7 Conclusion
then constructed, where all table names, column names are
known to the test case generator (e.g., users, uname, and This paper has proposed and outlined the main design
pwd in Fig. 7). Then depending on the syntax tree, test case idea of SAFELI, a static analysis tool which can automati-
generator pulls from the attack pattern library a set of ap- cally generate test cases exploiting SQL injection vulnera-
plicable attack patterns and parameterize them with the col- bilities in ASP.NET Web applications. The novelty of the
umn names. The idea of most attack patterns is to enforce tool lies in its satisfiability decision/approximation proce-
the WHERE clause to be tautology. For example, in Fig. 7, the dure for string constraints. By symbolically executing an
expression used in the original WHERE clause is replaced by ASP.NET Web application, SAFELI constructs equations
an attack pattern “COLUMN = ’Σ+ ∗ ’ OR COLUMN<>’’”, on strings which match a certain attack pattern. Once fully
where “COLUMN” is replaced by “uname” for the motivating implemented, SAFELI can take advantage of source code
example. Then the concrete test case is generated, as shown information and will be able to discover very delicate vul-
in the following example. nerabilities that cannot be discovered by black-box vulner-
ability scanners. Our future work includes completing the
Example 6.1 A set of equations can be established for the implementation of SAFELI and exploring algorithms to au-
motivating example in Section 2, whose SQL statement tomatically enumerate SQL WHERE clauses.
structure is presented in Fig. 7. Note that “Substr” stands
for “Substring” in the following equations. References
s1 = sUname.Replace("’","’’").Substr(0,16) (3)
[1] C. Anley. Advanced SQL Injection In SQL Server Ap-
s2 = sPwd.replace("’","’’").Substr(0,16) (4) plications. Next Generation Security Software LTD.
White Paper, 2002.
“uname=’” + s1 + “’ pwd=’” + s2 + "’" 1 Note that the “15” in the formula actually does not mean to repeat

≡ uname=’Σ+ ∗ ’ ∗ OR uname<>’’ (5) (’’|Σ− ) 15 times, but to restrict the length of its repetitions to 15.

7
[2] C. Anley. More Advanced SQL Injection. Next Gen- [13] S. Khurshid, C. S. Pasăreănu, and W. Visser. General-
eration Security Software LTD. White Paper, 2002. ized symbolic execution for model checking and test-
ing. In Proceedings of the 9th International Confer-
[3] S. W. Boyd and A. D. Keromytis. SQLrand: Prevent- ence on Tools and Algorithms for the Construction and
ing SQL injection attacks. In Proceedings of the 2nd Analysis of Systems (TACAS), volume 2619 of LNCS,
Applied Cryptography and Network Security (ACNS) 2003.
Conference, volume 3089 of Lecture Notes in Com-
puter Science, pages 292–304. Springer, 2004. [14] J. C. King. Symbolic execution and program testing.
Communications of the ACM, 19(7):385–394, 1976.
[4] J. Burch, E. Clarke, K. McMillan, D. Dill, and
L. Hwang. Symbolic model checking: 1020 states and [15] A. Nguyen-Tuong, S. Guarnieri, D. Greene, J. Shirley,
beyond. In IEEE Symposium on Logic in Computer and D. Evans. Automatically hardening web appli-
Science, pages 428–439, 1990. cations using precise tainting. In Proceedings of the
20th IFIP International Information Security Confer-
[5] B. Cabral, P. Marques, and L. Silva. RAIL: Code ence, 2005.
Instrumentation for .NET. In Proceedings of the
20th Annual ACM Symposium on Applied Computing [16] R.E. Bryant. Graph-based algorithms for boolean
(SAC), 2005. function manipulation. In Proceedings of the 27th
ACM/IEEE Design Automation Conference, 1986.
[6] A. Christensen, A. Møller, and M. Schwartzbach. Pre-
cise analysis of string expressions. In Proceedings of [17] N. Tillmann and W. Schulte. Parameterized unit tests
the International Static Analysis Symposium (SAS’03), with unit meist. In Proceedings of the 10th Eu-
2003. ropean Software Engineering Conference Joint with
13th ACM SIGSOFT International Symposium on
[7] SPI Dynamics. Webinspect: Security throughout Foundations of Software Engineering (ESEC/FSE),
the application lifecycle. SPI Dynamics. Datasheet. 2005.
http://www.spidynamics.com/assets/
documents/WebInspect_DataSheets.pdf. [18] J. Yang, C. Sar, P. Twohey, C. Cadar, and D. Engler.
Automatically generating malicious disks using sym-
[8] C. Gould, Z. Su, and P. Devanbu. JDBC Checker: A bolic execution. In Proceedings of the 2006 IEEE
Static Analysis Tool for SQL/JDBC Applications. In Symposium on Security and Privacy (S&P 2006),
Proceedings of the 26th International Conference on 2006.
Software Engineering, pages 697–698, 2004.
[19] T. Yavuz-Kahveci, M. Tuncer, and T. Bultan. A li-
[9] W. Halfond and A. Orso. AMNESIA: Analysis and brary for composite symbolic representations. In Pro-
Monitoring for NEutralizing SQL-Injection Attacks. ceedings of the 7th International Conference on Tools
In Proceedings of the 20th IEEE/ACM international and Algorithms for the Construction and Analysis of
Conference on Automated software enginee, pages Systems, volume 2031 of Lecture Notes in Computer
174–183, 2005. Science, pages 335–344. Springer-Verlag, April 2001.
[10] Y.W. Huang, S.K. Huang, T.P. Lin, and C.H. Tsai.
Web application security assessment by fault injec-
tion and behavior monitoring. In Proceedings of
the 11th International World Wide Web Conference
(WWW 2003), 2003.
[11] G. S. Kc, A. D. Keromytis, and V. Prevelakis. Coun-
tering code-injection attacks with instruction set ran-
domization. In Proceedings of the ACM Confer-
ence on Computer and Communications Security
(CCS’03), 2003.
[12] W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeis-
man, and D. Wonnacott. The Omega library interface
guide. Technical Report CS-TR-3445, Department of
Computer Science, University of Maryland, College
Park, March 1995.

View publication stats

You might also like