0% found this document useful (0 votes)
7 views9 pages

TJ 15 2021 1 112-120

The document discusses the use of ensemble machine learning techniques to detect SQL injection attacks, a significant threat to web application security. It proposes a model utilizing four algorithms: Gradient Boosting Machine, Adaptive Boosting, Extended Gradient Boosting Machine, and Light Gradient Boosting Machine, with the latter achieving the highest accuracy of 0.993371. The methodology includes dataset collection, feature extraction, model training, and testing, highlighting the importance of robust input validation to mitigate such attacks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views9 pages

TJ 15 2021 1 112-120

The document discusses the use of ensemble machine learning techniques to detect SQL injection attacks, a significant threat to web application security. It proposes a model utilizing four algorithms: Gradient Boosting Machine, Adaptive Boosting, Extended Gradient Boosting Machine, and Light Gradient Boosting Machine, with the latter achieving the highest accuracy of 0.993371. The methodology includes dataset collection, feature extraction, model training, and testing, highlighting the importance of robust input validation to mitigate such attacks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ISSN 1846-6168 (Print), ISSN 1848-5588 (Online) Preliminary communication

https://doi.org/10.31803/tg-20210205101347

Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

Umar Farooq

Abstract: In the current era, SQL Injection Attack is a serious threat to the security of the ongoing cyber world particularly for many web applications that reside over the internet.
Many webpages accept the sensitive information (e.g. username, passwords, bank details, etc.) from the users and store this information in the database that also resides over
the internet. Despite the fact that this online database has much importance for remotely accessing the information by various business purposes but attackers can gain unrestricted
access to these online databases or bypass authentication procedures with the help of SQL Injection Attack. This attack results in great damage and variation to database and
has been ranked as the topmost security risk by OWASP TOP 10. Considering the trouble of distinguishing unknown attacks by the current principle coordinating technique, a
strategy for SQL injection detection dependent on Machine Learning is proposed. Our motive is to detect this attack by splitting the queries into their corresponding tokens with
the help of tokenization and then applying our algorithms over the tokenized dataset. We used four Ensemble Machine Learning algorithms: Gradient Boosting Machine (GBM),
Adaptive Boosting (AdaBoost), Extended Gradient Boosting Machine (XGBM), and Light Gradient Boosting Machine (LGBM). The results yielded by our models are near to
perfection with error rate being almost negligible. The best results are yielded by LGBM with an accuracy of 0.993371, and precision, recall, f1 as 0.993373, 0.993371, and
0.993370, respectively. The LGBM also yielded less error rate with False Positive Rate (FPR) and Root Mean Squared Error (RMSE) to be 0.120761 and 0.007, respectively. The
worst results are yielded by AdaBoost with an accuracy of 0.991098, and precision, recall, f1 as 0.990733, 0.989175, and 0.989942, respectively. The AdaBoost also yielded high
False Positive Rate (FPR) to be 0.009.

Keywords: Boosting; ensemble learning; Light GBM; SQL injection; web security

1 INTRODUCTION • When hostile data is used to retrieve sensitive data from


the database or dynamic query is concatenated with both
A Web Application is software that uses internet hostile data and structure [5].
connected web browsers and has gained high importance for
performing different tasks in social, commercial, academic, SQL injection attacks are classified into seven
and other platforms. These web applications are connected to categories: tautologies, illegal/logically incorrect queries,
back-end relational databases operated by Structured Query piggy-backed queries, stored queries, inference and alternate
Language (SQL) that hold a huge amount of information like encodings [6]. In SQL injection a malicious script is being
usernames, passwords, bank details, etc., and are used for embedded into a less secure web application through an entry
communication, online transactions, data storage, accessing node then bypassed to the back-end database. This script then
social networks, etc. Despite all the importance of these web forces the web application to produce results from the
applications it provides a way for hackers and crackers to database through queries that shouldn’t be executed normally
attack these databases. Securing the web data must be of the or ever. Using this attack, an attacker can get all the data from
utter importance for developers of these web applications. the database by bypassing the authentication and
Almost 98% of web applications are prone to various authorization of the web application.
attacks but the top most one is SQL Injection attack as is SQL injection is a code injection technique that can
listed as number one in the top ten web application security provide the attacker with an unauthorized access to the
risks by Open Web Application Security Project (OWASP) sensitive information in the database. It not only gets the
[1, 2]. This attack has been listed in top ten vulnerabilities by unrestricted access but it can also be utilized to disturb data
OWASP from last fifteen years [3]. Refined software and integrity by adding, deleting, or modifying the records in a
other tools are also used nowadays to perform injection database. SQL injection attack is primarily focused on
attacks controlled by machines [4]. exploiting vulnerability in the security of a web application
SQL injection is an exploitation technique that that is when the user input is not correctly validated or
compromises the security at database layer of a web filtered, and when user input is not typed strongly and
application. This vulnerability usually occurs due to executed unexpectedly. It also occurs when there is weakness
insufficient validation of inputs and directly including them in the code, programming language. It is an attack vector for
in a SQL query. By utilizing these vulnerabilities, an attacker web applications but also can be used to attack any kind of
can submit SQL queries legitimately to the database. SQL database. Hackers can gain unauthorized access to
Generally, any web application is prone to SQL injection underlying data, structure, and DBMS. The well understood
attack when any of the following vulnerabilities are present example of SQL injection attack is tautological one,
in the web application: “SELECT * FROM Users WHERE User-id = 1 or 1=1”,
• When filtration, validation, and sanitization of input data where the injection happens due to the true condition using
from the user is not applied by the web application. OR. Attackers nowadays use other ways to perform mass
• When the dynamic queries or non-defined calls are given SQL injection attacks such as refined tools or botnets for
directly to the interpreter. discovering of vulnerable sites [3].

112 TECHNICAL JOURNAL 15, 1(2021), 112-120


Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

UNION query into parameter that happens to be weak hence


vulnerable. This can be secured by verifying the user inputs
strictly and avoid execution of multiple queries on the side of
database [7].
Example: SELECT * FROM accountTable WHERE
user login= UNION SELECT * FROM accountTable
WHERE No=10232 – AND passwd = AND pin=

2.4 Stored Procedures

This type is used to execute remote commands, perform


DOS, and for privilege escalation. In this attack, the attacker
uses delimiter “;” and stored procedure keywords such as
“EXEC”, “SHUTDOWN”, etc. This can be secured by
verifying the user input with a low privileged account for
execution and executing stored procedures within a safe
interface with appropriate roles [7].
Example: SELECT * FROM accountTable WHERE
user login= ‘umar’ AND passwd = ‘farooq’; SHUTDOWN;–
Figure 1 Typical SQL Injection Attack ;

2 BACKGROUND 2.5 Illegal/Logically Incorrect Queries

In this section, we will briefly mention out all the ten This type is used to detect such parameters that are
types of SQL injection attack. vulnerable to injection and then extract data from the
identified database. In this attack, attacker tries to extract all
2.1 Tautologies information about database and structure. This can be
secured by verifying inputs from user and avoiding the
The attacker uses a conditional query wherein the generation of error messages from database [7].
‘WHERE’ clause is used to inject and make the condition a Example: SELECT * FROM accountTable WHERE user
tautology that always happens to be true. In example login= ’umar”’ AND passwd =
“SELECT * FROM Users WHERE User-id = 1 or 1=1”, the
query will result all the data in the database the condition of 2.6 Inference
WHERE clause is true. This can be secured by restricting the
users to input special characters like single quotes, double This type is used to detect such parameters that are
quotes, equality, and other symbols that are used to make the vulnerable to injection and then extract data from the
malicious queries [7]. database with schema identified. This attack is launched on
Example: SELECT * FROM accountTable WHERE secured databases and is of two types: Inference blind SQL
user login= or 1=1 injection and Inference time SQL injection [7].
Example: 1; IF SYSTEM_USER='sa' SELECT 1/0
2.2 Piggy-Backed Query ELSE SELECT 5

This type is used to retrieve data, modify database, 2.7 Alternate Coding
execute commands and perform Denial of Services (DOS)
attack. In this attack, attacker tries to inject other malicious This type is used to escape from being detected. In this
queries along with the normal/original query. The original attack, attacker injects encoded text to bypass detection
query is true and executed normally while as additional techniques with the help of signatures like EXEC (), Char (),
malicious queries are injected without checking. This can be ASCII (), BIN (), HEX (), UNHEX (), BASE64 (), DEC (),
secured by avoiding execution of multiple statements and ROT13 (), etc. This can be secured by verifying user inputs
checking for delimiter in all queries [7]. and prohibition of meta-characters [7].
Example: SELECT * FROM accountTable WHERE Example: SELECT * FROM accountTable WHERE
user login=umar AND passwd=; drop accountTable user – user login= ’umar’;exec(char(0x59842 352646f776e)) AND
AND pin=221 passwd =’farooq’ AND pin =; SHUTDOWN;–;

2.3 Union Query 2.8 End of Line Comment

This type is used to bypass authentication and extract all SELECT * FROM Accounts WHERE accountName =
data from the database. In this attack, attacker inserts a ‗admin‘--‗AND password = ‗‘
This statement logs the hacker as admin user [8].

TEHNIČKI GLASNIK 15, 1(2021), 112-120 113


Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

2.9 Blind Injection mechanism of role-based access [14]. The detection rate with
this model is 93%, however future attack cannot be detected
This type is used for asking Boolean (true/false) with this data and the classifier relies on the labeled data.
questions and the information is extracted depending upon
the behavior of the web page. The web page functions 4 METHODOLOGY
normally if the injection attack is true, otherwise the web
page functions differently [8]. The main motive of the proposed model is to detect SQL
Injection attack. The whole procedure is performed in four
2.10 Timings Attacks stages:
1) The first stage focuses on collecting the dataset that
This type is used to derive information with the help of contains proper SQL injection attack queries. For this
If-Then statements where the attacker notes the timing delays issue, we created a dataset that contains SQL queries,
of responses from the database [8]. SQL injection attack queries, and plain text. The
Generally, SQL injection attack is divided into three labelling of the dataset is done in this stage.
types depending upon the mode of transfer of incoming and 2) The second stage deals with extracting all the features
outgoing data. The three types are in-band, out-of-band, and from all the queries and selecting the best of them (a.k.a.
inferential [9]. In in-band SQL injection attack, the attacker Feature extraction and feature selection). Tokenization is
extracts the information from the same channel that is used used in this stage to divide the queries into tokens.
for sending the query or performing the attack. In out-of-band 3) The third stage deals with training the model. The model
SQL injection attack, the attacker extracts the information is trained in this phase with 70% of the dataset (a.k.a.
with the help of another channel like email. In inferential Training part).
SQL injection attack, the attacker does not extract the 4) The fourth stage is focused on using the 30% of dataset
information using any channels rather launches other attacks that we separated from the collected dataset for testing
to analyze the behavior of the web application. and evaluating the proposed model with the selected best
feature set (a.k.a. Testing part).
3 RELATED WORK
4.1 Dataset
Multiple studies and researches have been carried out so
far on the field of SQL injection and it’s detection by using The most important part in detecting a SQL injection
various approaches like static & dynamic analysis, combined attack is collecting a meaningful dataset that contains SQL
technique, machine learning, Hash technique, Black Box injection attack queries. The main contribution in this paper
testing, etc. [10]. is a labelled dataset that we manually collected for the said
Static analysis checks whether each stream from a source problem. The dataset not only contains SQL injection attack
to a sink is dependent upon an info approval and additionally queries but also normal SQL injection queries and plain text
input purifying routine [11]; though dynamic analysis queries so that the proposed model will properly comprehend
depends on progressively mining the developer's planned and differentiate between normal and attacking SQL queries.
query structure on any information and recognizes assaults The dataset is collected in three phases: 1) the normal SQL
by contrasting it against the structure of the real given query injection queries are collected in first phase, 2) the SQL
[12]. injection attack queries are collected in the second phase, and
AMNESIA, as a consolidated methodology, is a model- 3) the plain text is collected in the third phase. We collected
based method that consolidates the static and dynamic these queries in the text format and applied labelling and
analysis for detection and prevention of SQL injection preprocessing methods on it and then converted it to a csv
attacks. It uses static analysis in order to make the SQL query file. We applied tokenization on the dataset and formed a new
models at the time of accessing the database. It then uses tokenized dataset. The dataset contains a total of 35198
dynamic analysis before the queries are sent to database and queries with 21 features. The dataset has the following three
compares them with the already built statically models [10]. categories:
But there are some queries and code snippets generation
approaches that make this model less efficient with more 4.1.1 Non-Malicious or Normal SQL Queries
error rate [13].
Hidden Markov Model (HMM) has been presented to These queries, non-malicious in nature, are used to
detect malicious queries with the help of machine learning in create, maintain, and retrieve database in the form of tables
two phases: training and running phase. The first phase (relational database). The tokens (keywords) used in this type
focuses on collecting known malicious and benign queries are: (rename, drop, delete, insert, create, exec, update, union,
and the second phase focuses on detecting injection attacks. set, Alter, database, and, or, information_schema, load_file,
Author, by himself, cleared that WHERE clause and select, shutdown, cmdshell, hex, ascii). Also the dangerous
piggybacked queries cannot be detected by this model [4]. characters used in this type are: --, #, /*, ', '', ||, \\, =, /**/,@@.
Detection of SQL injection attack based on Naïve Bayes
machine learning algorithm was proposed combined with the

114 TECHNICAL JOURNAL 15, 1(2021), 112-120


Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

4.1.2 SQL Injection Attack Queries/Malicious SQL Queries 4.1.3 Plain Text

These queries are used to execute malicious SQL These are simply in the form of plain text. The tokens
statements in a web application and bypass the security (keywords) used in this type are alphabets and digits. The
measures. These queries are also used to add, modify, and plain text is used in this dataset in order to make sure that the
delete records in a database in an unrestricted way. The proposed model properly comprehends and differentiated
tokens (keywords) used in this type are: , *, ; , _, -, (, ), =, {, between the SQL query, SQL injection query and the plain
}, @, ., , &, [, ], +, -, ?, %, !, :, \, /. Also the SQL tokens used text that the user inputs in the login node of any web app.
are: where, table, like, select, update, and, or, set, like, in, The detailed description of the collected dataset
having, values, into, alter, as, create, revoke, deny, convert, (features) is given below in Tabs. 1 and 2.
exec, concat, char, tuncat, ASCII, any, asc, desc, check,
group by, order by, delete from, insert into, drop table, union,
join.

Table 1 Description of features of dataset


S. No. Feature Description
1 data It contains all the full queries
2 no_sngle_quts Total number of single quotations in a query
3 no_dble_quts Total number of double quotations in a query
4 no_punctn Total number of punctuations in a query
5 no_sgle_cmnt Total number of single line comments in a query
6 no_mlt_cmnt Total number of multi-line comments in a query
7 no_whte_spce Total number of white spaces in a query
8 no_nrml_kywrds Total number of normal keywords in a query
9 no_hmfl_kywrds Total number of harmful keywords in a query
10 no_prctge Total number of percentage (%) symbols in a query
11 no_log_oprtr Total number of logical operators in a query
12 no_oprtr Total number of operators in a query
13 no_null_valus Total number of null values in a query
14 no_hexdcml_valus Total number of hexadecimal values in a query
15 no_db_info_cmnds Total number of database information commands in a query
16 no_roles Total number of roles (e.g., Admin, user, etc.) in a query
17 no_ntwr_cmnds Total number of network commands in a query
18 no_lanage-cmnds Total number of language commands in a query
19 no_alphabet Total number of alphabets in a query
20 no_digits Total number of digits in a query
21 no_spl_chrtr Total number of special characters in a query

Table 2 Description of labels recognize the greater part of SQIA types like
S. No. Label Description Count Ratio redundancies/tautologies, union, piggybacked,
1 0 It represents the normal SQL queries 6888 19.57%
It represents the SQL injection attack
illegal/logically incorrect, alternate encodings and stored
2 1 18369 52.19% procedures which are dealt with the same as SQL queries.
queries
3 2 It represents the plain text 9941 28.24% Let us take the example of or 1=1 to understand
the concept of tokenization.
4.2 Tokenization By applying the tokenization to the above query, the
output is given below and is in accordance with the features
The keywords used in SQL injection attack are used to listed in Tab. 1:
launch operations on the database tables. These keywords
play an important role in launching SQL injection attack as
or 1=1

the keywords perform the unexpected tasks. So, there is a 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 2 2 0 1


need to differentiate these keywords form a normal and
malicious query. The method of tokenization is used to
perform such operation i.e., extract the tokens from the actual 4.3 Training Ensemble Models
queries. In simple terms, tokenization is the process of
dividing a query into a list of tokens (keywords). Depending The main phase is to train the machine learning
upon these extracted tokens, the proposed model extracts algorithms for the detection of SQL injection attack with the
features. Each query is represented by a sequence of numbers manually collected dataset. The selected ensemble learning
where each number represents one of the features represented algorithms that we used in our proposed model are Gradient
in Tab. 1. Boosting Machine (GBM), Adaptive Boosting (AdaBoost),
The suitable determination of these features plays an Extended Gradient Boosting Machine (XGBM), and Light
essential function in detection of SQL injection attack. The Gradient Boosting Machine (LGBM). To have a better
reasoning for picking these sorts of features is its capacity to understanding of how the machine learning models would

TEHNIČKI GLASNIK 15, 1(2021), 112-120 115


Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

perform over the testing data we applied three and five-fold Table 7 MAE report of our proposed model
cross-validation where we split the dataset into 3 and 5 parts, MAE
Classifier Partition Strategy 3-CV 5-CV
respectively. The advantage of cross validation is that all the GBM 0.010321 0.011590
observations are utilized for both training and testing the AdaBoost Training Set = 70% 0.011553 0.011553
models, and each observation is used for testing exactly once. XGBoost Testing Set = 30% 0.011742 0.011742
Light GBM 0.009280 0.009280
5 RESULTS AND DISCUSSION
Table 8 MSE report of our proposed model
MSE
As per the experiments that we conducted, we come to Classifier Partition Strategy 3-CV 5-CV
conclusion that our proposed system is enough to detect SQL GBM 0.014678 0.016590
injection attack queries from normal and plain text queries AdaBoost Training Set = 70% 0.016856 0.016856
with 21 features. We focused on making the features as much XGBoost Testing Set = 30% 0.017992 0.017992
as possible in order to make the proposed model robust and Light GBM 0.014583 0.014583
detect all types of SQL injection attack queries, efficiently.
Table 9 RMSE report of our proposed model
To evaluate the performance of our proposed model we
RMSE
applied the algorithms, ensemble boosting in nature, on the Classifier Partition Strategy 3-CV 5-CV
testing data (30% of the original dataset). The classification GBM 0.121152 0.128805
results that were evolved by the proposed model are near AdaBoost Training Set = 70% 0.129830 0.129830
perfection and are depicted in the below tables and figures. XGBoost Testing Set = 30% 0.134135 0.134135
We separated the results in different tables, where in Light GBM 0.120761 0.120761
every table represents different classification metrics such as
Table 10 FPR report of our proposed model
accuracy (Acc.), precision (Pr.), recall (Re.), f1 score (f1), False Positives
false positive rate (FPR), root mean squared error (RMSE), Classifier Partition Strategy 3-CV 5-CV
mean absolute error (MAE), and mean squared error (MSE), GBM 0.008 0.009
to analyze the behavior of our system properly. The results AdaBoost Training Set = 70% 0.009 0.010
are depicted in below Tabs. 3-14. XGBoost Testing Set = 30% 0.008 0.008
Light GBM 0.007 0.007

5.1 Classification Report


5.2 Confusion Matrix
Table 3 Accuracy report of our proposed model
Accuracy Confusion matrix is a performance measurement for
Classifier Partition Strategy 3-CV 5-CV machine learning classifiers with different combinations of
GBM 0.991856 0.990909
actual and predicted values. The above results are calculated
AdaBoost Training Set = 70% 0. 991098 0. 991098
XGBoost Testing Set = 30% 0.992233 0.992233 with the help of confusion matrix that is used to evaluate the
Light GBM 0.993371 0.993371 overall performance of our proposed classification system.
As the problem we chose is multi-class classification with
Table 4 Precision report of our proposed model three classes (normal SQL query, SQL injection attack query,
Precision and plain text), hence the confusion matrix is 3×3. The
Classifier Partition Strategy 3-CV 5-CV following classification metrics are evaluated:
GBM 0.991791 0.990660
AdaBoost Training Set = 70% 0.990733 0.990733
XGBoost Testing Set = 30% 0.991400 0.991400 TP + TN
Light GBM 0.993373 0.993373
Accuracy = (1)
TP + TN + FN + FP
Table 5 Recall report of our proposed model TP
Precision = (2)
Recall TP + FP
Classifier Partition Strategy 3-CV 5-CV (TP )
GBM 0.990388 0.989341 Recall = (3)
AdaBoost Training Set = 70% 0.989175 0.989175 (TP + FN )
XGBoost Testing Set = 30% 0.990596 0.990596
Precision ∗ Recall
Light GBM 0.993371 0.993371 F1 Score= 2 ∗ (4)
Precision + Recall
Table 6 F1 score report of our proposed model n

Classifier
F1 Score
Partition Strategy 3-CV 5-CV
∑ abs(yi − y)
i =1
GBM 0.991084 0.989997
MAE = (5)
n
AdaBoost Training Set = 70% 0.989942 0.989942
XGBoost Testing Set = 30% 0.992234 0.992234 1 n 
Light GBM 0.993370 0.993370 MSE
= ∑ (yi − yi )2
n i =1
(6)

116 TECHNICAL JOURNAL 15, 1(2021), 112-120


Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

1 n The classification report of our proposed system is given



RMSE
= ∑
n i =1
(yi − yi ) 2 (7) in Fig. 2 wherein we represented it in graphical form.

FP
=FPR or 1 − Recall (8)
FP + TN

The confusion matrix of our algorithms is given below


where 0, 1, and 2 represent normal SQL queries, SQL
injection attack queries, and plain text, respectively.

Table 11 Confusion matrix of AdaBoost


AdaBoost
Actual
0 1 2
Predicted

0 1966 12 21
1 7 5473 37
2 6 20 3018
Figure 2 Classification report
Table 12 Confusion matrix of GBM
GBM The error report, in graphical form, of our proposed
Actual system is given in Fig. 3.
0 1 2
Predicted

0 2078 12 15
1 3 5461 22
2 8 26 2935

Table 13 Confusion matrix of XGBoost


XGBoost
Actual
0 1 2
Predicted

0 2060 21 37
1 7 5388 41
2 4 28 2974

Table 14 Confusion matrix of LGBM


Light GBM
Actual
Figure 3 Error report
0 1 2
Predicted

0 2095 6 17
1 1 5418 17 The classification reports evaluated by our four models
2 11 18 2977 are given in Fig. 4.

Figure 4 Classification report from GBM, AdaBoost, XGBM, and LGBM, respectively

TEHNIČKI GLASNIK 15, 1(2021), 112-120 117


Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

Figure 5 Classification report from GBM, AdaBoost, XGBM, and LGBM, respectively (continuation)

Figure 6 ROC results from GBM, AdaBoost, XGBM, and LGBM, respectively

5.3 Roc Curves terms of accuracy. Our proposed model dominates other
existing models in terms of accuracy with less error rate.
The ROC values evaluated by our algorithms are given
in Tab. 15. Table 16 Comparative analysis
Classifiers/Models Accuracy
Table 15 ROC values of our proposed models SVM, Naïve Bayes, GBM, REGEX [15] 97%
Algorithms GBM AdaBoost XGBoost Light GBM Neural Network system [16] 96.8%
ROC Value 0.995449 0.997657 0.999548 0.999845 Genetic- fuzzy rule-based system [17] 98.4%
SVM [18] 98%
K-means [19] 98.36%
5.4 Comparative Analysis Our Proposed model (GBM, AdaBoost, XGBM, LGBM) 99.34%

The comparative analysis for the research that has been


made on SQL injection attack is depicted in the table below
(Tab. 16) and we compared them with the proposed model in

118 TECHNICAL JOURNAL 15, 1(2021), 112-120


Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

6 CONCLUSION [5] OWASP. https://owasp.org/www-project-top-ten/2017/A1_


2017-Injection. (Accessed on 19.11.2020).
In this research work, we proposed SQL injection attack [6] Moosa, A. (2010). Artificial Neural Network based Web
Application Firewall for SQL Injection. World Academy of
detection model based on 21 features in order to increase the
Science, Engineering and Technology, International Journal of
efficiency of our classifiers. The main target of our system Computer and Information Engineering, 4(4), 610-619.
was particularly SQL injection attack that is increasing day https://panel.waset.org/publications/1001/pdf
by day while being used with some malicious content to gain [7] Sheykhkanloo, N. M. (2015). SQL-IDS: Evaluation of SQLi
unrestricted access to databases and extract sensitive Attack Detection and Classification Based on Machine
information. These malicious queries can bypass Learning Techniques. The 8th International Conference on
authentication and authorization and can finally alter, Security of Information and Networks (SIN15), Sochi, Russia.
modify, and delete the database. Keeping this as our https://doi.org/10.1145/2799979.2800011
objective, we proposed a robust model for detection of SQL [8] Kaur, M. & Agrawal, A. P. (2012). Token Sequencing
Approach to Prevent SQL Injection Attacks. IOSR Journal of
injection attack queries from normal queries and plain text.
Computer Engineering (IOSRJCE), 1(1), 31-37.
In this work, the foremost step we carried out was to create a https://doi.org/10.9790/0661-0113137
balanced dataset that contains normal and malicious SQL [9] Sadeghian, A., Zamani, M., & Ibrahim, S. (2013). SQL
queries. We also introduced plain text to this dataset in order injection is still alive: a study on SQL injection signature
to make the proposed model perform well and differentiate evasion techniques. In International Conference on
malicious queries from normal and plain text. Informatics and Creative Multimedia, Kuala Lumpur,
The proposed model when applied to the dataset Malaysia, 265-268. https://doi.org/10.1109/ICICM.2013.52
achieves an average accuracy of more than 99% with almost [10] Halfond, W. G. & Orso, A. (2005). AMNESIA: analysis and
negligible error rate that indicates the selected feature set is monitoring for neutralizing SQL-injection attacks. In
Proceedings of the 20th IEEE/ACM International Conference
quite efficient to discriminate SQL injection attack queries
on Automated Software Engineering, 174-183.
from normal SQL queries and plain text. For real world https://doi.org/10.1145/1101908.1101935
detection systems, the analysis indicate that our proposed [11] Shar, L. K. & Tan, H. B. K. (2013). Defeating SQL injection.
system that is based on ensemble machine learning with the Computer, 46, 69-77. https://doi.org/10.1109/MC.2012.283
selected features can be applied in such SQL injection attack [12] Tajpour, A. & Shooshtar, M. J. Z. (2010). Evaluation of SQL
detection systems. The best test accuracy happens to be injection detection and prevention techniques. In Second IEEE
99.34% with 0.007 percent FPR while as the lowest one is International Conference on Computational Intelligence,
99.11% with 0.009 percent FPR, yielded by LGBM and Communication Systems and Networks, Liverpool, UK, 216-
AdaBoost, respectively. The other two algorithms GBM and 221. https://doi.org/10.1109/CICSyN.2010.55
[13] Dharam, R. & Shiva, S. G. (2013). Runtime monitors to detect
XGBM that we used yielded accuracy of 99.19% and
and prevent union query based SQL injection attacks. In Tenth
99.22%, respectively. International Conference on Information Technology: New
Generations, Las Vegas, USA, 357-362.
Notice https://doi.org/10.1109/ITNG.2013.57
[14] Joshi, A. & Geetha, V. (2014). SQL Injection detection using
This paper was presented at IC2ST-2021 – International machine learning. In 2014 International Conference on
Conference on Convergence of Smart Technologies. This Control, Instrumentation, Communication and Computational
conference was organized in Pune, India by Aspire Research Technologies (ICCICCT), Kanyakumari, IEEE, 1111-1115.
Foundation, January 9-10, 2021. The paper will not be https://doi.org/10.1109/ICCICCT.2014.6993127
[15] Kranthikumar, B. & Velusamy, R. L. (2020). SQL injection
published anywhere else.
detection using REGEX classifier. Journal of Xi'an University
of Architecture & Technology, 12(6), 800-809.
7 REFERENCES [16] Sheykhkanloo, N. M. (2015). SQL-IDS: evaluation of SQLi
attack detection and classification based on machine learning
[1] OWASP. https://owasp.org/www-project-top-ten/. (Accessed techniques. In Proceedings of the 8th International Conference
on 18.11.2020). on Security of Information and Networks, USA, 258-266.
[2] Farooq, U. (2020). Real Time Password Strength Analysis on https://doi.org/10.1145/2799979.2800011
a Web Application Using Multiple Machine Learning [17] Basta, C., Elfatatry, A., & Darwish, S. (2016). Detection of
Approaches. International Journal of Engineering Research & SQL Injection Using a Genetic Fuzzy Classifier System.
Technology (IJERT), 9(12), 359-364. International Journal of Advanced Computer Science and
[3] Moh, M., Pininti, S., Doddapaneni, S., & Moh, T. (2016). Applications (IJACSA), 7(6), 129-137.
Detecting Web Attacks Using Multi-stage Log Analysis. 2016 https://doi.org/10.14569/IJACSA.2016.070616
IEEE 6th International Conference on Advanced Computing [18] Jagadessan, J., Shrivastava, A., Ansari, A., Kar, L. K., &
(IACC), Bhimavaram, 733-738. Kumar, M. (2019). Detection and Prevention Approach to
https://doi.org/10.1109/IACC.2016.141 SQLi and Phishing Attack using Machine Learning.
[4] Kar, D., Agarwal, K., Sahoo, A., & Panigrahi, S. (2016). International Journal of Engineering and Advanced
Detection of SQL injection attacks using Hidden Markov Technology (IJEAT), 8(4), 791-799.
Model. 2016 IEEE International Conference on Engineering [19] Patel, M. P. & Sivaraman, D. B. (2017). SQL injection
and Technology (ICETECH), Coimbatore, India. Detection for Secure Atomic and Molecular Database node for
https://doi.org/10.1109/ICETECH.2016.7569180 India. International Journal of Advance Research and
Innovative Ideas in Education (IJARIIE), 3(2), 3867-3879.

TEHNIČKI GLASNIK 15, 1(2021), 112-120 119


Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

Author’s contact:

Umar Farooq,
Department of Computer Science & Technology (Cyber Security),
Central University of Punjab,
City Campus, Mansa Road, Bathinda 151001, Punjab, India
[email protected]

120 TECHNICAL JOURNAL 15, 1(2021), 112-120

You might also like