Static Analysis Framework For Detecting SQL Injection Vulnerabilities
Static Analysis Framework For Detecting SQL Injection Vulnerabilities
net/publication/4269529
CITATIONS READS
150 1,185
6 authors, including:
All content following this page was uploaded by Lixin Tao on 05 June 2014.
Abstract Example 1.1 A log-in page has two text box fields for en-
tering user name and password. Let sUname and sPwd
Recently SQL Injection Attack (SIA) has become a major represent the strings contained in text boxes. Presented be-
threat to Web applications. Via carefully crafted user input, low is a piece of back-end C# code that constructs SQL
attackers can expose or manipulate the back-end database statement on the fly. Here “+” denotes string concatenation.
of a Web application. This paper proposes the construc-
"SELECT uname, pass FROM users WHERE \n uname=’"
tion and outlines the design of a static analysis framework + sUname + "’ AND pass=’" + sPwd + "’"
(called SAFELI) for identifying SIA vulnerabilities at com-
pile time. SAFELI statically inspects MSIL bytecode of an The above SQL query intends to verify existence of the
ASP.NET Web application, using symbolic execution. At user name/password pair supplied. However, if attacker
each hotspot that submits SQL query, a hybrid constraint enters “admin’ -- ” as user name, and leaves password
solver is used to find out the corresponding user input that blank, the following SQL statement is constructed.
could lead to breach of information security. Once com-
pleted, SAFELI has the future potential to discover more SELECT uname,pass FROM users WHERE
uname=’admin’ -- ’ AND pass=’’
delicate SQL injection attacks than black-box Web security
inspection tools.
Since “--” comments out the rest of query, it only veri-
keywords: SQL Injection Attack, Symbolic Execution, fies existence of user name “admin”. The attacker can log
Constraint Solver, Automatic Testing. in without supplying password as long as admin account
exists.
1
• Intrusion Detection Based on Static Analysis: Using 2 Motivating Example
static string analysis technique [6], it is possible to con-
struct a regular expression that conservatively approxi- This section describes a non-trivial SIA vulnerability
mates the set of SQL statements generated at a hotspot that can not be easily detected by black-box testing tools.
(which submits SQL query). The information can be It is motivated by the vulnerability example of string size
used to statically analyze syntax correctness of SQL restriction in [1]. Fig. 1 presents a massage() method that
statements [8] and to model “normal behaviors” of a processes a user input string in two steps. First, it checks
Web application. During run time, any SQL statement whether the input string contains suspicious SQL keywords
not contained in the approximation library will be re- such as “--”, “OR”, and “drop”. Then it substitutes each
jected by intrusion detection [9]. single quote character with “’’”, i.e., escape character of
single quote in SQL. Finally, massage() tries to provide
• Black-box Testing: Black-box testing [10, 7] can be further protection by restricting the length of the output
used to discover SIA vulnerabilities, by applying a li- within 16.
brary of pre-collected attack patterns. It is fast and ef-
String massage(String strInput)
fective, however, without prior knowledge of source {
code, it has difficulty in discovering non-trivial vulner- //1. SQL keyword search
if(strInput.IndexOf("--")!=-1
abilities. || strInput.IndexOf("OR")!=-1
|| strInput.IndexOf("drop")!=-1)
• SQL Randomization: As an extension of instruction throw new Exception(
"Possible SQL Injection Attack: " + strInput
randomization [11], SQL randomization [3] instru- );
ments a Web application and appends a random num-
//2. massage the data for single quote
ber after each SQL keyword used to build SQL state- String sOut = strInput.Replace("’","’’");
ments. SQL parser is modified correspondingly to ac- sOut = sOut.Substring(0,16);
return sOut;
cept randomized SQL keywords. At run-time, since }
a user injected SQL keyword does not have random
number appended, SIA fails due to syntax error.
Figure 1. String Massage
This paper proposes the construction of a static analysis
framework (called SAFELI) for discovering SIA vulnera- The SQL generation statement in Example 1.1 can be
bilities at compile-time. While the tool is still under devel- strengthened as below.
opment, it is beneficial to share its main design idea in this "SELECT uname, pass FROM users WHERE \n uname=’"
paper. The contributions of this paper are listed below: +massage(sUname)+ "’ AND pass=’" +massage(sPwd)+ "’"
1. White-box Static Analysis: SAFELI analyzes byte- Readers can verify that the above code can defend the
code. It relies on string analysis (similar to [9]). How- attack strings in Example 1.1 because “--” is filtered out.
ever, it discovers vulnerabilities at compile-time, but Further more, single quote characters will be replaced by
not at run-time. Once implemented, SAFELI is able to “’’” and hence causing no harm. The massage() function
generate user inputs as hard-evidence of a vulnerabil- in Fig. 1, however, has a very delicate bug. Consider the
ity. following input for user name and password, respectively:
123456789012345’
2. Hybrid-Constraint Solver: SAFELI employs a brand OR uname<>’
new string analysis technique (other than the one used
in [6, 9, 8]). The technique can handle hybrid con- Notice both strings are 16 characters long. After going
straints that involve boolean, integer, and string vari- through the massage operations, the following SQL state-
ables. Most popular string operations can be handled. ment is generated.
This paper is organized as follows. Section 2 presents a SELECT uname,pass FROM users WHERE
uname=’123456789012345’’ AND pass=’ OR uname<>’’
motivating example to be used throughout the paper. Sec-
tion 3 introduces the general structure of SAFELI. Section Notice that its WHERE clause is always a tautology. It
4 discusses the symbolic execution framework. Section 5 consists of two conditions: (1) whether uname is equal to
presents the core technique, i.e., hybrid constraint solver. constant string “123456789012345’’ AND pass=” (note
Section 6 covers test case generation and applies hybrid the escape character “’’” inside), or (2) uname is not an
constraint solver algorithm on the motivating example. Sec- empty string. Obviously “uname<>’’” always evaluates
tion 7 concludes the paper and proposes the future work. to true. The trick is that the length of malicious strings
2
are both 16. Although the Replace function generates es- 4.1 Background Information
cape characters “’’”, half of the “’’” is then cut off by the
Substring method at the end of massage(). Such deli- The history of symbolic execution, which symbolically
cate bugs cannot be easily discovered by black-box testing interprets and verifies correctness of sequential programs,
tools. can be dated back to 1970’s [14]. During symbolic execu-
tion, initial values of input variables are represented using
3 SAFELI Framework symbolic constraints. Each branch in the program is tagged
with a corresponding path condition. When an exception is
This section presents the overall structure of SAFELI encountered or some system safety property is violated, the
(Static Analysis Framework for discovEring sqL Injec- path condition is sent to a constraint solver for generating
tion vulnerabilities), which is currently under development. the corresponding initial values of input variables. Sym-
SAFELI consists of the following components: bolic execution has been widely applied in automatic test
case generation [17], discovery of Operating System vul-
• MSIL Instrumentor: The module instruments MSIL nerabilities [18], and combination of model checking to an-
bytecode of an ASP.NET application for symbolic ex- alyze heap configurations and data structures [13].
ecution. It inserts additional monitoring function at We briefly introduce the idea of symbolic execu-
each location where data members of objects are ac- tion using one example presented in Fig. 2. Function
cessed. Values of (uninitialized) variables are replaced PointInRectangle() checks if a point (x, y) is con-
with symbolic constraints. Each hotspot, i.e., the loca- tained in a rectangle whose left top vertex is (x1, y1) and
tion which submits SQL statement, is tagged to trigger whose width and height are w1 and h1 respectively. Func-
constraint solver. tion hasCollision() tests whether two rectangles collide
• Symbolic Execution Engine: It is essentially a wrapper with each other by examining whether any of the four ver-
of the .Net Framework. The execution engine itera- tices of the first rectangle is contained in the second rectan-
tively examines the back-end code for each ASP page gle. We verify the correctness of hasCollision() in the
main() function by calling it twice and swapping the se-
one by one. When hotspots are reached, a library of
pre-set attack patterns is consulted, based on which quence of two rectangles in its input parameters. We expect
main() function returns without any exception.
a hybrid string constraint is constructed and sent to
constraint solver for generating vulnerability evidence Symbolic execution starts at line 19, where all eight in-
(user input). teger variables are assigned a symbolic value, and let them
be a1 , b1 , c1 , d1 , a2 , b2 , c2 , d2 . Then symbolic execution
• Library of Attack Patterns: The module stores a col- traces to line 21 and steps into hasCollision() where
lection of pre-set attack patterns, each of which is rep-
resented using a regular expression. 1: static bool PointInRectangle(int x, int y,
2: int x1, int y1, int w1, int h1)
• Constraint Solver: Given a constraint, the solver tests 3: {
its satisfiability and generates valuation of variables 4: return (x>=x1 && y>=y1 && x<=x1+w1 && y<=y1+h1);
5: }
that satisfy the constraint. Different than other pop- 6:
ular platforms of symbolic execution, the Constraint 7: static bool hasCollision(int x1,int y1,int w1,int h1,
8: int x2, int y2, int w2, int h2)
Solver of SAFELI can solve string constraints. Details 9: {
are discussed in Section 5. 10: bool b1 = PointInRectangle(x1,y1,x2,y2,w2,h2);
11: bool b2 = PointInRectangle(x1+w1,y1,x2,y2,w2,h2);
12: bool b3 = PointInRectangle(x1,y1+h1,x2,y2,w2,h2);
• Test Case Generator: When initial valuations are gen- 13: bool b4 = PointInRectangle(x1+w1,y1+h1,x2,y2,w2,h2);
erated, they are passed to the Test Case Generator. 14: return (b1 || b2 || b3 || b4);
15: }
The module then injects the values into HTML fields 16:
and posts the web page back to server. It then uses 17: static void main(string [] args)
18: {
a heuristic algorithm to analyze the response from 19: int x1,y1,w1,h1; //uninitialized, or init by
server. When vulnerability is verified, a step by step 20: int x2,y2,w2,h2; //e.g., x1=int.Parse(args[0])
21: bool b11 = hasCollision(x1,y1,w1,h1,x2,y2,w2,h2);
error trace is generated. 22: bool b21 = hasCollision(x2,y2,w2,h2,x1,y1,w1,h1);
23: if(b11==b21)
24: return;
4 Symbolic Execution 25: else
26: throw new Exception("hasCollision() incorrect!");
27: }
This section presents the Symbolic Execution Engine of
SAFELI. After a brief background introduction, we discuss
the MSIL instrumentor of SAFELI. Figure 2. Collision Detection
3
String sqlstat = "";
four variables b1, b2, b3, and b4 are added into variable if(a<2){
collection. At line 10, b1 is associated with the following sqlstat = "SELECT * FROM users WHERE uname=’"
+ txtboxUname.Text + "’";
symbolic constraint, letting it be B1 : }else{
sqlstat = "SELECT * FROM users WHERE uname=’guest’";
a1 ≥ a2 ∧ b1 ≥ b2 ∧ a1 ≤ a2 + c2 ∧ b1 ≤ b2 + d2 . }
4
In Fig. 5, at the “global” level, there is one instance tive Normal Form (DNF), as shown below:
of SymbolicState, named ss. Here “global” refers _
to the scope of the C# class that corresponds to the Ii ∧ Bi ∧ Si .
i
ASP page it serves. Each assignment of a variable is
replaced by the statement that constructs an expression In the above formula, Ii , Bi and Si represent the integer,
and updates the mapping of the symbolic state. For boolean, and string expression in the i′ th conjunction. No-
example, the “sqlstat=""” statement is translated into tice that the set of variables appeared in Ii and Bi are mutu-
“ss.Map[sqlstat]=new StrExpr("");”. ally exclusive, however, integer variables in Ii could appear
The condition in each if statement (and similarly while in Si . For example, consider the following constraint (let it
loop) is replaced by a random choice() function which be S1 ):
non-deterministically generates a boolean value. At the be-
i < 5 ∧ j > 2 ∧ str1 = str2.Substring(i, j)
ginning of each branch, the current path condition is up-
dated correspondingly. For example, at line 5 of Fig. 5, In this case, we have to solve integer and boolean con-
i.e., beginning of the “true” branch, the path condition is straints first, and then concretize the solution and substitute
conjuncted with the constraint that corresponds to the if- the appearance of any variables in Si with the correspond-
condition “a<2”. ing concretized values. We can generate multiple sets of
The design of SAFELI bytecode instrumentor and sym- solutions, and spawn multiple instances of the string con-
bolic execution engine generally follows the idea of S. straint. For example, the following are part of the con-
Khurshid et al.’s work [13] on Java bytecode. Notice that in straints generated for S1 above. However, note that this
SAFELI we need to execute code multiple times to achieve approach can lead to false negative.
the complete branch coverage, e.g., to execute the code in (i = 1 ∧ j = 3) ∧ str1 = str2.Substring(1, 3))
Fig. 5 twice guarantees a 50% coverage rate, and to execute
∨ (i = 2 ∧ j = 4) ∧ str1 = str2.Substring(2, 4))
it three times guarantees 75%. This naive instrumentation
algorithm can be further improved.
5.2 Backward String Image Computation
5 Constraint Solver
Backward image computation is the key to solving string
constraints. It is an essential concept in symbolic model
This section outlines the algorithm of constraint solver checking. In the context of string manipulation, we define
module (currently under development). It has two respon- backward image as follows: given a set of strings R and
sibilities: (1) to decide satisfiability of path constraints, and a string operation f (e.g., Substring and CharAt), the
(2) to find out the initial values of input variables that lead backward image of R w.r.t. f is the maximal set of strings
to the breach of database security. Notice that our algo- X where for each string s ∈ X : f (s) ∈ R. In SAFELI,
rithm is conservative w.r.t. satisfiability – it can report false both R and its backward image are expressed using regular
negatives (i.e., a satisfiable constraint is reported as non- expression. We now briefly describe the image computation
satisfiable) but will not report false positives. Its implica- algorithm for popular string operations.
tions are as follows: (1) If SAFELI reports an error, the
concretized input variable values will eventually lead to the • string length: The function returns the length of a
error if the program is executed as many times as possi- string. Given the following equation
ble; and (2) If SAFELI does not report an error, there might
still be vulnerability in the program because the constraint s.Length = k,
solver might have false negative reports.
where k is an integer constant, the solution of s is Σk ,
5.1 Hybrid Constraint Solver where Σ is the alphabet. Similarly, given an inequality
s.Length < k the solution of s is Σ0 | Σ1 | ... | Σk−1 .
The design idea of hybrid constraint solver is very sim- Given an inequality s.Length >= k the solution of s
ilar to that of the Action Language Verifier [19]. There is Σk Σ∗ .
are three categories of expressions in a hybrid constraint:
boolean expression, integer expression, and string expres- • substring: The Substring(a, b) operation chops one
sion. Each type is represented symbolically. Boolean ex- substring from source string, starting at index a, with
pressions are represented using BDD [16, 4], integer ex- length b. Here a and b are two integer constants. Given
pressions are represented using Presburger constraints [12], a regular expression r and the equation as below
and string expressions are represented using regular expres-
sion. A hybrid constraint is always represented in Disjunc- s.Substring(a, b) = r,
5
the solution of s is Σa (r ∩ Σb )Σ∗ . Here ∩ repre- 1 1 1,1 8,10
Σ ’ u u ’
sents the intersection of regular languages. It is obvi- 2 2 1,2 7,8
_
ous that the solution of s is also regular because regular n n =
language is closed under concatenation, intersection, 3 3 1,3 6,8
p a a d
union, complementation, difference, and substitution. 4 4 1,4 5,8
w m m w
• charat: The CharAt(a) function returns the character 5 5 1,5 4,8
at index a. Given the equation d e
9
e Σ_ p
6 6 1,6 3,8
’ ’ _
s.CharAt(i) = r,
= = Σ_ =
7 7 ’ 8 1,7 ’ 1,8 2,8
’ ’ ’ ’ ’
the solution of s is Σi (r ∩ Σ1 )Σ∗ . 8 10
’
1,9
6
SQL-STATMT
In the above equations, s1 and s2 are the massaged
SELECT-STMT
strings of the user name and password. Note that “ ∗ ” on the
right of Equation 5 represents a sequence of white spaces.
SELECT COL_LIST FROM WHERE Using a similar technique in Example 5.1, we can get the
solution of s1 as follows:
uname pwd users EXPR
(’’|Σ− )∗ ’
AND
Figure 7. Solving Motivating Example The solution of sPwd is expressed using a regular expres-
sion as below:
6. Test Case Generation OR uname<>’Σ∗
≡ uname=’Σ+ ∗ ’ ∗ OR uname<>’’ (5) (’’|Σ− ) 15 times, but to restrict the length of its repetitions to 15.
7
[2] C. Anley. More Advanced SQL Injection. Next Gen- [13] S. Khurshid, C. S. Pasăreănu, and W. Visser. General-
eration Security Software LTD. White Paper, 2002. ized symbolic execution for model checking and test-
ing. In Proceedings of the 9th International Confer-
[3] S. W. Boyd and A. D. Keromytis. SQLrand: Prevent- ence on Tools and Algorithms for the Construction and
ing SQL injection attacks. In Proceedings of the 2nd Analysis of Systems (TACAS), volume 2619 of LNCS,
Applied Cryptography and Network Security (ACNS) 2003.
Conference, volume 3089 of Lecture Notes in Com-
puter Science, pages 292–304. Springer, 2004. [14] J. C. King. Symbolic execution and program testing.
Communications of the ACM, 19(7):385–394, 1976.
[4] J. Burch, E. Clarke, K. McMillan, D. Dill, and
L. Hwang. Symbolic model checking: 1020 states and [15] A. Nguyen-Tuong, S. Guarnieri, D. Greene, J. Shirley,
beyond. In IEEE Symposium on Logic in Computer and D. Evans. Automatically hardening web appli-
Science, pages 428–439, 1990. cations using precise tainting. In Proceedings of the
20th IFIP International Information Security Confer-
[5] B. Cabral, P. Marques, and L. Silva. RAIL: Code ence, 2005.
Instrumentation for .NET. In Proceedings of the
20th Annual ACM Symposium on Applied Computing [16] R.E. Bryant. Graph-based algorithms for boolean
(SAC), 2005. function manipulation. In Proceedings of the 27th
ACM/IEEE Design Automation Conference, 1986.
[6] A. Christensen, A. Møller, and M. Schwartzbach. Pre-
cise analysis of string expressions. In Proceedings of [17] N. Tillmann and W. Schulte. Parameterized unit tests
the International Static Analysis Symposium (SAS’03), with unit meist. In Proceedings of the 10th Eu-
2003. ropean Software Engineering Conference Joint with
13th ACM SIGSOFT International Symposium on
[7] SPI Dynamics. Webinspect: Security throughout Foundations of Software Engineering (ESEC/FSE),
the application lifecycle. SPI Dynamics. Datasheet. 2005.
http://www.spidynamics.com/assets/
documents/WebInspect_DataSheets.pdf. [18] J. Yang, C. Sar, P. Twohey, C. Cadar, and D. Engler.
Automatically generating malicious disks using sym-
[8] C. Gould, Z. Su, and P. Devanbu. JDBC Checker: A bolic execution. In Proceedings of the 2006 IEEE
Static Analysis Tool for SQL/JDBC Applications. In Symposium on Security and Privacy (S&P 2006),
Proceedings of the 26th International Conference on 2006.
Software Engineering, pages 697–698, 2004.
[19] T. Yavuz-Kahveci, M. Tuncer, and T. Bultan. A li-
[9] W. Halfond and A. Orso. AMNESIA: Analysis and brary for composite symbolic representations. In Pro-
Monitoring for NEutralizing SQL-Injection Attacks. ceedings of the 7th International Conference on Tools
In Proceedings of the 20th IEEE/ACM international and Algorithms for the Construction and Analysis of
Conference on Automated software enginee, pages Systems, volume 2031 of Lecture Notes in Computer
174–183, 2005. Science, pages 335–344. Springer-Verlag, April 2001.
[10] Y.W. Huang, S.K. Huang, T.P. Lin, and C.H. Tsai.
Web application security assessment by fault injec-
tion and behavior monitoring. In Proceedings of
the 11th International World Wide Web Conference
(WWW 2003), 2003.
[11] G. S. Kc, A. D. Keromytis, and V. Prevelakis. Coun-
tering code-injection attacks with instruction set ran-
domization. In Proceedings of the ACM Confer-
ence on Computer and Communications Security
(CCS’03), 2003.
[12] W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeis-
man, and D. Wonnacott. The Omega library interface
guide. Technical Report CS-TR-3445, Department of
Computer Science, University of Maryland, College
Park, March 1995.