0% found this document useful (0 votes)

7 views9 pages

TJ 15 2021 1 112-120

The document discusses the use of ensemble machine learning techniques to detect SQL injection attacks, a significant threat to web application security. It proposes a model utilizing four algorithms: Gradient Boosting Machine, Adaptive Boosting, Extended Gradient Boosting Machine, and Light Gradient Boosting Machine, with the latter achieving the highest accuracy of 0.993371. The methodology includes dataset collection, feature extraction, model training, and testing, highlighting the importance of robust input validation to mitigate such attacks.

Uploaded by

phantrunghieuf22a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views9 pages

TJ 15 2021 1 112-120

Uploaded by

phantrunghieuf22a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

ISSN 1846-6168 (Print), ISSN 1848-5588 (Online) Preliminary communication

https://doi.org/10.31803/tg-20210205101347

Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

Umar Farooq

Abstract: In the current era, SQL Injection Attack is a serious threat to the security of the ongoing cyber world particularly for many web applications that reside over the internet.
Many webpages accept the sensitive information (e.g. username, passwords, bank details, etc.) from the users and store this information in the database that also resides over
the internet. Despite the fact that this online database has much importance for remotely accessing the information by various business purposes but attackers can gain unrestricted
access to these online databases or bypass authentication procedures with the help of SQL Injection Attack. This attack results in great damage and variation to database and
has been ranked as the topmost security risk by OWASP TOP 10. Considering the trouble of distinguishing unknown attacks by the current principle coordinating technique, a
strategy for SQL injection detection dependent on Machine Learning is proposed. Our motive is to detect this attack by splitting the queries into their corresponding tokens with
the help of tokenization and then applying our algorithms over the tokenized dataset. We used four Ensemble Machine Learning algorithms: Gradient Boosting Machine (GBM),
Adaptive Boosting (AdaBoost), Extended Gradient Boosting Machine (XGBM), and Light Gradient Boosting Machine (LGBM). The results yielded by our models are near to
perfection with error rate being almost negligible. The best results are yielded by LGBM with an accuracy of 0.993371, and precision, recall, f1 as 0.993373, 0.993371, and
0.993370, respectively. The LGBM also yielded less error rate with False Positive Rate (FPR) and Root Mean Squared Error (RMSE) to be 0.120761 and 0.007, respectively. The
worst results are yielded by AdaBoost with an accuracy of 0.991098, and precision, recall, f1 as 0.990733, 0.989175, and 0.989942, respectively. The AdaBoost also yielded high
False Positive Rate (FPR) to be 0.009.

Keywords: Boosting; ensemble learning; Light GBM; SQL injection; web security

1 INTRODUCTION • When hostile data is used to retrieve sensitive data from

the database or dynamic query is concatenated with both
A Web Application is software that uses internet hostile data and structure [5].
connected web browsers and has gained high importance for
performing different tasks in social, commercial, academic, SQL injection attacks are classified into seven
and other platforms. These web applications are connected to categories: tautologies, illegal/logically incorrect queries,
back-end relational databases operated by Structured Query piggy-backed queries, stored queries, inference and alternate
Language (SQL) that hold a huge amount of information like encodings [6]. In SQL injection a malicious script is being
usernames, passwords, bank details, etc., and are used for embedded into a less secure web application through an entry
communication, online transactions, data storage, accessing node then bypassed to the back-end database. This script then
social networks, etc. Despite all the importance of these web forces the web application to produce results from the
applications it provides a way for hackers and crackers to database through queries that shouldn’t be executed normally
attack these databases. Securing the web data must be of the or ever. Using this attack, an attacker can get all the data from
utter importance for developers of these web applications. the database by bypassing the authentication and
Almost 98% of web applications are prone to various authorization of the web application.
attacks but the top most one is SQL Injection attack as is SQL injection is a code injection technique that can
listed as number one in the top ten web application security provide the attacker with an unauthorized access to the
risks by Open Web Application Security Project (OWASP) sensitive information in the database. It not only gets the
[1, 2]. This attack has been listed in top ten vulnerabilities by unrestricted access but it can also be utilized to disturb data
OWASP from last fifteen years [3]. Refined software and integrity by adding, deleting, or modifying the records in a
other tools are also used nowadays to perform injection database. SQL injection attack is primarily focused on
attacks controlled by machines [4]. exploiting vulnerability in the security of a web application
SQL injection is an exploitation technique that that is when the user input is not correctly validated or
compromises the security at database layer of a web filtered, and when user input is not typed strongly and
application. This vulnerability usually occurs due to executed unexpectedly. It also occurs when there is weakness
insufficient validation of inputs and directly including them in the code, programming language. It is an attack vector for
in a SQL query. By utilizing these vulnerabilities, an attacker web applications but also can be used to attack any kind of
can submit SQL queries legitimately to the database. SQL database. Hackers can gain unauthorized access to
Generally, any web application is prone to SQL injection underlying data, structure, and DBMS. The well understood
attack when any of the following vulnerabilities are present example of SQL injection attack is tautological one,
in the web application: “SELECT * FROM Users WHERE User-id = 1 or 1=1”,
• When filtration, validation, and sanitization of input data where the injection happens due to the true condition using
from the user is not applied by the web application. OR. Attackers nowadays use other ways to perform mass
• When the dynamic queries or non-defined calls are given SQL injection attacks such as refined tools or botnets for
directly to the interpreter. discovering of vulnerable sites [3].

112 TECHNICAL JOURNAL 15, 1(2021), 112-120

Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

UNION query into parameter that happens to be weak hence

vulnerable. This can be secured by verifying the user inputs
strictly and avoid execution of multiple queries on the side of
database [7].
Example: SELECT * FROM accountTable WHERE
user login= UNION SELECT * FROM accountTable
WHERE No=10232 – AND passwd = AND pin=

2.4 Stored Procedures

This type is used to execute remote commands, perform

DOS, and for privilege escalation. In this attack, the attacker
uses delimiter “;” and stored procedure keywords such as
“EXEC”, “SHUTDOWN”, etc. This can be secured by
verifying the user input with a low privileged account for
execution and executing stored procedures within a safe
interface with appropriate roles [7].
Example: SELECT * FROM accountTable WHERE
user login= ‘umar’ AND passwd = ‘farooq’; SHUTDOWN;–
Figure 1 Typical SQL Injection Attack ;

2 BACKGROUND 2.5 Illegal/Logically Incorrect Queries

In this section, we will briefly mention out all the ten This type is used to detect such parameters that are
types of SQL injection attack. vulnerable to injection and then extract data from the
identified database. In this attack, attacker tries to extract all
2.1 Tautologies information about database and structure. This can be
secured by verifying inputs from user and avoiding the
The attacker uses a conditional query wherein the generation of error messages from database [7].
‘WHERE’ clause is used to inject and make the condition a Example: SELECT * FROM accountTable WHERE user
tautology that always happens to be true. In example login= ’umar”’ AND passwd =
“SELECT * FROM Users WHERE User-id = 1 or 1=1”, the
query will result all the data in the database the condition of 2.6 Inference
WHERE clause is true. This can be secured by restricting the
users to input special characters like single quotes, double This type is used to detect such parameters that are
quotes, equality, and other symbols that are used to make the vulnerable to injection and then extract data from the
malicious queries [7]. database with schema identified. This attack is launched on
Example: SELECT * FROM accountTable WHERE secured databases and is of two types: Inference blind SQL
user login= or 1=1 injection and Inference time SQL injection [7].
Example: 1; IF SYSTEM_USER='sa' SELECT 1/0
2.2 Piggy-Backed Query ELSE SELECT 5

This type is used to retrieve data, modify database, 2.7 Alternate Coding
execute commands and perform Denial of Services (DOS)
attack. In this attack, attacker tries to inject other malicious This type is used to escape from being detected. In this
queries along with the normal/original query. The original attack, attacker injects encoded text to bypass detection
query is true and executed normally while as additional techniques with the help of signatures like EXEC (), Char (),
malicious queries are injected without checking. This can be ASCII (), BIN (), HEX (), UNHEX (), BASE64 (), DEC (),
secured by avoiding execution of multiple statements and ROT13 (), etc. This can be secured by verifying user inputs
checking for delimiter in all queries [7]. and prohibition of meta-characters [7].
Example: SELECT * FROM accountTable WHERE Example: SELECT * FROM accountTable WHERE
user login=umar AND passwd=; drop accountTable user – user login= ’umar’;exec(char(0x59842 352646f776e)) AND
AND pin=221 passwd =’farooq’ AND pin =; SHUTDOWN;–;

2.3 Union Query 2.8 End of Line Comment

This type is used to bypass authentication and extract all SELECT * FROM Accounts WHERE accountName =
data from the database. In this attack, attacker inserts a ‗admin‘--‗AND password = ‗‘
This statement logs the hacker as admin user [8].

TEHNIČKI GLASNIK 15, 1(2021), 112-120 113

Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

2.9 Blind Injection mechanism of role-based access [14]. The detection rate with
this model is 93%, however future attack cannot be detected
This type is used for asking Boolean (true/false) with this data and the classifier relies on the labeled data.
questions and the information is extracted depending upon
the behavior of the web page. The web page functions 4 METHODOLOGY
normally if the injection attack is true, otherwise the web
page functions differently [8]. The main motive of the proposed model is to detect SQL
Injection attack. The whole procedure is performed in four
2.10 Timings Attacks stages:
1) The first stage focuses on collecting the dataset that
This type is used to derive information with the help of contains proper SQL injection attack queries. For this
If-Then statements where the attacker notes the timing delays issue, we created a dataset that contains SQL queries,
of responses from the database [8]. SQL injection attack queries, and plain text. The
Generally, SQL injection attack is divided into three labelling of the dataset is done in this stage.
types depending upon the mode of transfer of incoming and 2) The second stage deals with extracting all the features
outgoing data. The three types are in-band, out-of-band, and from all the queries and selecting the best of them (a.k.a.
inferential [9]. In in-band SQL injection attack, the attacker Feature extraction and feature selection). Tokenization is
extracts the information from the same channel that is used used in this stage to divide the queries into tokens.
for sending the query or performing the attack. In out-of-band 3) The third stage deals with training the model. The model
SQL injection attack, the attacker extracts the information is trained in this phase with 70% of the dataset (a.k.a.
with the help of another channel like email. In inferential Training part).
SQL injection attack, the attacker does not extract the 4) The fourth stage is focused on using the 30% of dataset
information using any channels rather launches other attacks that we separated from the collected dataset for testing
to analyze the behavior of the web application. and evaluating the proposed model with the selected best
feature set (a.k.a. Testing part).
3 RELATED WORK
4.1 Dataset
Multiple studies and researches have been carried out so
far on the field of SQL injection and it’s detection by using The most important part in detecting a SQL injection
various approaches like static & dynamic analysis, combined attack is collecting a meaningful dataset that contains SQL
technique, machine learning, Hash technique, Black Box injection attack queries. The main contribution in this paper
testing, etc. [10]. is a labelled dataset that we manually collected for the said
Static analysis checks whether each stream from a source problem. The dataset not only contains SQL injection attack
to a sink is dependent upon an info approval and additionally queries but also normal SQL injection queries and plain text
input purifying routine [11]; though dynamic analysis queries so that the proposed model will properly comprehend
depends on progressively mining the developer's planned and differentiate between normal and attacking SQL queries.
query structure on any information and recognizes assaults The dataset is collected in three phases: 1) the normal SQL
by contrasting it against the structure of the real given query injection queries are collected in first phase, 2) the SQL
[12]. injection attack queries are collected in the second phase, and
AMNESIA, as a consolidated methodology, is a model- 3) the plain text is collected in the third phase. We collected
based method that consolidates the static and dynamic these queries in the text format and applied labelling and
analysis for detection and prevention of SQL injection preprocessing methods on it and then converted it to a csv
attacks. It uses static analysis in order to make the SQL query file. We applied tokenization on the dataset and formed a new
models at the time of accessing the database. It then uses tokenized dataset. The dataset contains a total of 35198
dynamic analysis before the queries are sent to database and queries with 21 features. The dataset has the following three
compares them with the already built statically models [10]. categories:
But there are some queries and code snippets generation
approaches that make this model less efficient with more 4.1.1 Non-Malicious or Normal SQL Queries
error rate [13].
Hidden Markov Model (HMM) has been presented to These queries, non-malicious in nature, are used to
detect malicious queries with the help of machine learning in create, maintain, and retrieve database in the form of tables
two phases: training and running phase. The first phase (relational database). The tokens (keywords) used in this type
focuses on collecting known malicious and benign queries are: (rename, drop, delete, insert, create, exec, update, union,
and the second phase focuses on detecting injection attacks. set, Alter, database, and, or, information_schema, load_file,
Author, by himself, cleared that WHERE clause and select, shutdown, cmdshell, hex, ascii). Also the dangerous
piggybacked queries cannot be detected by this model [4]. characters used in this type are: --, #, /*, ', '', ||, \\, =, /**/,@@.
Detection of SQL injection attack based on Naïve Bayes
machine learning algorithm was proposed combined with the

114 TECHNICAL JOURNAL 15, 1(2021), 112-120

Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

4.1.2 SQL Injection Attack Queries/Malicious SQL Queries 4.1.3 Plain Text

These queries are used to execute malicious SQL These are simply in the form of plain text. The tokens
statements in a web application and bypass the security (keywords) used in this type are alphabets and digits. The
measures. These queries are also used to add, modify, and plain text is used in this dataset in order to make sure that the
delete records in a database in an unrestricted way. The proposed model properly comprehends and differentiated
tokens (keywords) used in this type are: , *, ; , _, -, (, ), =, {, between the SQL query, SQL injection query and the plain
}, @, ., , &, [, ], +, -, ?, %, !, :, \, /. Also the SQL tokens used text that the user inputs in the login node of any web app.
are: where, table, like, select, update, and, or, set, like, in, The detailed description of the collected dataset
having, values, into, alter, as, create, revoke, deny, convert, (features) is given below in Tabs. 1 and 2.
exec, concat, char, tuncat, ASCII, any, asc, desc, check,
group by, order by, delete from, insert into, drop table, union,
join.

Table 1 Description of features of dataset

S. No. Feature Description
1 data It contains all the full queries
2 no_sngle_quts Total number of single quotations in a query
3 no_dble_quts Total number of double quotations in a query
4 no_punctn Total number of punctuations in a query
5 no_sgle_cmnt Total number of single line comments in a query
6 no_mlt_cmnt Total number of multi-line comments in a query
7 no_whte_spce Total number of white spaces in a query
8 no_nrml_kywrds Total number of normal keywords in a query
9 no_hmfl_kywrds Total number of harmful keywords in a query
10 no_prctge Total number of percentage (%) symbols in a query
11 no_log_oprtr Total number of logical operators in a query
12 no_oprtr Total number of operators in a query
13 no_null_valus Total number of null values in a query
14 no_hexdcml_valus Total number of hexadecimal values in a query
15 no_db_info_cmnds Total number of database information commands in a query
16 no_roles Total number of roles (e.g., Admin, user, etc.) in a query
17 no_ntwr_cmnds Total number of network commands in a query
18 no_lanage-cmnds Total number of language commands in a query
19 no_alphabet Total number of alphabets in a query
20 no_digits Total number of digits in a query
21 no_spl_chrtr Total number of special characters in a query

Table 2 Description of labels recognize the greater part of SQIA types like
S. No. Label Description Count Ratio redundancies/tautologies, union, piggybacked,
1 0 It represents the normal SQL queries 6888 19.57%
It represents the SQL injection attack
illegal/logically incorrect, alternate encodings and stored
2 1 18369 52.19% procedures which are dealt with the same as SQL queries.
queries
3 2 It represents the plain text 9941 28.24% Let us take the example of or 1=1 to understand
the concept of tokenization.
4.2 Tokenization By applying the tokenization to the above query, the
output is given below and is in accordance with the features
The keywords used in SQL injection attack are used to listed in Tab. 1:
launch operations on the database tables. These keywords
play an important role in launching SQL injection attack as
or 1=1

the keywords perform the unexpected tasks. So, there is a 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 2 2 0 1

need to differentiate these keywords form a normal and
malicious query. The method of tokenization is used to
perform such operation i.e., extract the tokens from the actual 4.3 Training Ensemble Models
queries. In simple terms, tokenization is the process of
dividing a query into a list of tokens (keywords). Depending The main phase is to train the machine learning
upon these extracted tokens, the proposed model extracts algorithms for the detection of SQL injection attack with the
features. Each query is represented by a sequence of numbers manually collected dataset. The selected ensemble learning
where each number represents one of the features represented algorithms that we used in our proposed model are Gradient
in Tab. 1. Boosting Machine (GBM), Adaptive Boosting (AdaBoost),
The suitable determination of these features plays an Extended Gradient Boosting Machine (XGBM), and Light
essential function in detection of SQL injection attack. The Gradient Boosting Machine (LGBM). To have a better
reasoning for picking these sorts of features is its capacity to understanding of how the machine learning models would

TEHNIČKI GLASNIK 15, 1(2021), 112-120 115

Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

perform over the testing data we applied three and five-fold Table 7 MAE report of our proposed model
cross-validation where we split the dataset into 3 and 5 parts, MAE
Classifier Partition Strategy 3-CV 5-CV
respectively. The advantage of cross validation is that all the GBM 0.010321 0.011590
observations are utilized for both training and testing the AdaBoost Training Set = 70% 0.011553 0.011553
models, and each observation is used for testing exactly once. XGBoost Testing Set = 30% 0.011742 0.011742
Light GBM 0.009280 0.009280
5 RESULTS AND DISCUSSION
Table 8 MSE report of our proposed model
MSE
As per the experiments that we conducted, we come to Classifier Partition Strategy 3-CV 5-CV
conclusion that our proposed system is enough to detect SQL GBM 0.014678 0.016590
injection attack queries from normal and plain text queries AdaBoost Training Set = 70% 0.016856 0.016856
with 21 features. We focused on making the features as much XGBoost Testing Set = 30% 0.017992 0.017992
as possible in order to make the proposed model robust and Light GBM 0.014583 0.014583
detect all types of SQL injection attack queries, efficiently.
Table 9 RMSE report of our proposed model
To evaluate the performance of our proposed model we
RMSE
applied the algorithms, ensemble boosting in nature, on the Classifier Partition Strategy 3-CV 5-CV
testing data (30% of the original dataset). The classification GBM 0.121152 0.128805
results that were evolved by the proposed model are near AdaBoost Training Set = 70% 0.129830 0.129830
perfection and are depicted in the below tables and figures. XGBoost Testing Set = 30% 0.134135 0.134135
We separated the results in different tables, where in Light GBM 0.120761 0.120761
every table represents different classification metrics such as
Table 10 FPR report of our proposed model
accuracy (Acc.), precision (Pr.), recall (Re.), f1 score (f1), False Positives
false positive rate (FPR), root mean squared error (RMSE), Classifier Partition Strategy 3-CV 5-CV
mean absolute error (MAE), and mean squared error (MSE), GBM 0.008 0.009
to analyze the behavior of our system properly. The results AdaBoost Training Set = 70% 0.009 0.010
are depicted in below Tabs. 3-14. XGBoost Testing Set = 30% 0.008 0.008
Light GBM 0.007 0.007

5.1 Classification Report

5.2 Confusion Matrix
Table 3 Accuracy report of our proposed model
Accuracy Confusion matrix is a performance measurement for
Classifier Partition Strategy 3-CV 5-CV machine learning classifiers with different combinations of
GBM 0.991856 0.990909
actual and predicted values. The above results are calculated
AdaBoost Training Set = 70% 0. 991098 0. 991098
XGBoost Testing Set = 30% 0.992233 0.992233 with the help of confusion matrix that is used to evaluate the
Light GBM 0.993371 0.993371 overall performance of our proposed classification system.
As the problem we chose is multi-class classification with
Table 4 Precision report of our proposed model three classes (normal SQL query, SQL injection attack query,
Precision and plain text), hence the confusion matrix is 3×3. The
Classifier Partition Strategy 3-CV 5-CV following classification metrics are evaluated:
GBM 0.991791 0.990660
AdaBoost Training Set = 70% 0.990733 0.990733
XGBoost Testing Set = 30% 0.991400 0.991400 TP + TN
Light GBM 0.993373 0.993373
Accuracy = (1)
TP + TN + FN + FP
Table 5 Recall report of our proposed model TP
Precision = (2)
Recall TP + FP
Classifier Partition Strategy 3-CV 5-CV (TP )
GBM 0.990388 0.989341 Recall = (3)
AdaBoost Training Set = 70% 0.989175 0.989175 (TP + FN )
XGBoost Testing Set = 30% 0.990596 0.990596
Precision ∗ Recall
Light GBM 0.993371 0.993371 F1 Score= 2 ∗ (4)
Precision + Recall
Table 6 F1 score report of our proposed model n

Classifier
F1 Score
Partition Strategy 3-CV 5-CV
∑ abs(yi − y)
i =1
GBM 0.991084 0.989997
MAE = (5)
n
AdaBoost Training Set = 70% 0.989942 0.989942
XGBoost Testing Set = 30% 0.992234 0.992234 1 n 
Light GBM 0.993370 0.993370 MSE
= ∑ (yi − yi )2
n i =1
(6)

116 TECHNICAL JOURNAL 15, 1(2021), 112-120

Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

1 n The classification report of our proposed system is given


RMSE
= ∑
n i =1
(yi − yi ) 2 (7) in Fig. 2 wherein we represented it in graphical form.

FP
=FPR or 1 − Recall (8)
FP + TN

The confusion matrix of our algorithms is given below

where 0, 1, and 2 represent normal SQL queries, SQL
injection attack queries, and plain text, respectively.

Table 11 Confusion matrix of AdaBoost

AdaBoost
Actual
0 1 2
Predicted

0 1966 12 21
1 7 5473 37
2 6 20 3018
Figure 2 Classification report
Table 12 Confusion matrix of GBM
GBM The error report, in graphical form, of our proposed
Actual system is given in Fig. 3.
0 1 2
Predicted

0 2078 12 15
1 3 5461 22
2 8 26 2935

Table 13 Confusion matrix of XGBoost

XGBoost
Actual
0 1 2
Predicted

0 2060 21 37
1 7 5388 41
2 4 28 2974

Table 14 Confusion matrix of LGBM

Light GBM
Actual
Figure 3 Error report
0 1 2
Predicted

0 2095 6 17
1 1 5418 17 The classification reports evaluated by our four models
2 11 18 2977 are given in Fig. 4.

Figure 4 Classification report from GBM, AdaBoost, XGBM, and LGBM, respectively

TEHNIČKI GLASNIK 15, 1(2021), 112-120 117

Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

Figure 5 Classification report from GBM, AdaBoost, XGBM, and LGBM, respectively (continuation)

Figure 6 ROC results from GBM, AdaBoost, XGBM, and LGBM, respectively

5.3 Roc Curves terms of accuracy. Our proposed model dominates other
existing models in terms of accuracy with less error rate.
The ROC values evaluated by our algorithms are given
in Tab. 15. Table 16 Comparative analysis
Classifiers/Models Accuracy
Table 15 ROC values of our proposed models SVM, Naïve Bayes, GBM, REGEX [15] 97%
Algorithms GBM AdaBoost XGBoost Light GBM Neural Network system [16] 96.8%
ROC Value 0.995449 0.997657 0.999548 0.999845 Genetic- fuzzy rule-based system [17] 98.4%
SVM [18] 98%
K-means [19] 98.36%
5.4 Comparative Analysis Our Proposed model (GBM, AdaBoost, XGBM, LGBM) 99.34%

The comparative analysis for the research that has been

made on SQL injection attack is depicted in the table below
(Tab. 16) and we compared them with the proposed model in

118 TECHNICAL JOURNAL 15, 1(2021), 112-120

Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

6 CONCLUSION [5] OWASP. https://owasp.org/www-project-top-ten/2017/A1_

2017-Injection. (Accessed on 19.11.2020).
In this research work, we proposed SQL injection attack [6] Moosa, A. (2010). Artificial Neural Network based Web
Application Firewall for SQL Injection. World Academy of
detection model based on 21 features in order to increase the
Science, Engineering and Technology, International Journal of
efficiency of our classifiers. The main target of our system Computer and Information Engineering, 4(4), 610-619.
was particularly SQL injection attack that is increasing day https://panel.waset.org/publications/1001/pdf
by day while being used with some malicious content to gain [7] Sheykhkanloo, N. M. (2015). SQL-IDS: Evaluation of SQLi
unrestricted access to databases and extract sensitive Attack Detection and Classification Based on Machine
information. These malicious queries can bypass Learning Techniques. The 8th International Conference on
authentication and authorization and can finally alter, Security of Information and Networks (SIN15), Sochi, Russia.
modify, and delete the database. Keeping this as our https://doi.org/10.1145/2799979.2800011
objective, we proposed a robust model for detection of SQL [8] Kaur, M. & Agrawal, A. P. (2012). Token Sequencing
Approach to Prevent SQL Injection Attacks. IOSR Journal of
injection attack queries from normal queries and plain text.
Computer Engineering (IOSRJCE), 1(1), 31-37.
In this work, the foremost step we carried out was to create a https://doi.org/10.9790/0661-0113137
balanced dataset that contains normal and malicious SQL [9] Sadeghian, A., Zamani, M., & Ibrahim, S. (2013). SQL
queries. We also introduced plain text to this dataset in order injection is still alive: a study on SQL injection signature
to make the proposed model perform well and differentiate evasion techniques. In International Conference on
malicious queries from normal and plain text. Informatics and Creative Multimedia, Kuala Lumpur,
The proposed model when applied to the dataset Malaysia, 265-268. https://doi.org/10.1109/ICICM.2013.52
achieves an average accuracy of more than 99% with almost [10] Halfond, W. G. & Orso, A. (2005). AMNESIA: analysis and
negligible error rate that indicates the selected feature set is monitoring for neutralizing SQL-injection attacks. In
Proceedings of the 20th IEEE/ACM International Conference
quite efficient to discriminate SQL injection attack queries
on Automated Software Engineering, 174-183.
from normal SQL queries and plain text. For real world https://doi.org/10.1145/1101908.1101935
detection systems, the analysis indicate that our proposed [11] Shar, L. K. & Tan, H. B. K. (2013). Defeating SQL injection.
system that is based on ensemble machine learning with the Computer, 46, 69-77. https://doi.org/10.1109/MC.2012.283
selected features can be applied in such SQL injection attack [12] Tajpour, A. & Shooshtar, M. J. Z. (2010). Evaluation of SQL
detection systems. The best test accuracy happens to be injection detection and prevention techniques. In Second IEEE
99.34% with 0.007 percent FPR while as the lowest one is International Conference on Computational Intelligence,
99.11% with 0.009 percent FPR, yielded by LGBM and Communication Systems and Networks, Liverpool, UK, 216-
AdaBoost, respectively. The other two algorithms GBM and 221. https://doi.org/10.1109/CICSyN.2010.55
[13] Dharam, R. & Shiva, S. G. (2013). Runtime monitors to detect
XGBM that we used yielded accuracy of 99.19% and
and prevent union query based SQL injection attacks. In Tenth
99.22%, respectively. International Conference on Information Technology: New
Generations, Las Vegas, USA, 357-362.
Notice https://doi.org/10.1109/ITNG.2013.57
[14] Joshi, A. & Geetha, V. (2014). SQL Injection detection using
This paper was presented at IC2ST-2021 – International machine learning. In 2014 International Conference on
Conference on Convergence of Smart Technologies. This Control, Instrumentation, Communication and Computational
conference was organized in Pune, India by Aspire Research Technologies (ICCICCT), Kanyakumari, IEEE, 1111-1115.
Foundation, January 9-10, 2021. The paper will not be https://doi.org/10.1109/ICCICCT.2014.6993127
[15] Kranthikumar, B. & Velusamy, R. L. (2020). SQL injection
published anywhere else.
detection using REGEX classifier. Journal of Xi'an University
of Architecture & Technology, 12(6), 800-809.
7 REFERENCES [16] Sheykhkanloo, N. M. (2015). SQL-IDS: evaluation of SQLi
attack detection and classification based on machine learning
[1] OWASP. https://owasp.org/www-project-top-ten/. (Accessed techniques. In Proceedings of the 8th International Conference
on 18.11.2020). on Security of Information and Networks, USA, 258-266.
[2] Farooq, U. (2020). Real Time Password Strength Analysis on https://doi.org/10.1145/2799979.2800011
a Web Application Using Multiple Machine Learning [17] Basta, C., Elfatatry, A., & Darwish, S. (2016). Detection of
Approaches. International Journal of Engineering Research & SQL Injection Using a Genetic Fuzzy Classifier System.
Technology (IJERT), 9(12), 359-364. International Journal of Advanced Computer Science and
[3] Moh, M., Pininti, S., Doddapaneni, S., & Moh, T. (2016). Applications (IJACSA), 7(6), 129-137.
Detecting Web Attacks Using Multi-stage Log Analysis. 2016 https://doi.org/10.14569/IJACSA.2016.070616
IEEE 6th International Conference on Advanced Computing [18] Jagadessan, J., Shrivastava, A., Ansari, A., Kar, L. K., &
(IACC), Bhimavaram, 733-738. Kumar, M. (2019). Detection and Prevention Approach to
https://doi.org/10.1109/IACC.2016.141 SQLi and Phishing Attack using Machine Learning.
[4] Kar, D., Agarwal, K., Sahoo, A., & Panigrahi, S. (2016). International Journal of Engineering and Advanced
Detection of SQL injection attacks using Hidden Markov Technology (IJEAT), 8(4), 791-799.
Model. 2016 IEEE International Conference on Engineering [19] Patel, M. P. & Sivaraman, D. B. (2017). SQL injection
and Technology (ICETECH), Coimbatore, India. Detection for Secure Atomic and Molecular Database node for
https://doi.org/10.1109/ICETECH.2016.7569180 India. International Journal of Advance Research and
Innovative Ideas in Education (IJARIIE), 3(2), 3867-3879.

TEHNIČKI GLASNIK 15, 1(2021), 112-120 119

Umar Farooq: Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

Author’s contact:

Umar Farooq,
Department of Computer Science & Technology (Cyber Security),
Central University of Punjab,
City Campus, Mansa Road, Bathinda 151001, Punjab, India
[email protected]

120 TECHNICAL JOURNAL 15, 1(2021), 112-120

Machine Learning for SQL Injection Prevention
No ratings yet
Machine Learning for SQL Injection Prevention
47 pages
05-SQL Injection Attack Detection and Prevention Techniques Using Deep Learning
No ratings yet
05-SQL Injection Attack Detection and Prevention Techniques Using Deep Learning
8 pages
Article 6152
No ratings yet
Article 6152
10 pages
Machine Learning for SQL Injection Defense
No ratings yet
Machine Learning for SQL Injection Defense
7 pages
Deep Learning for SQL Injection Detection
No ratings yet
Deep Learning for SQL Injection Detection
8 pages
Sat - 94.Pdf - Detection of SQL Injection Attack Usiing Adaptive Deep Forest
No ratings yet
Sat - 94.Pdf - Detection of SQL Injection Attack Usiing Adaptive Deep Forest
11 pages
SQL Injection Detection Using Machine Learning Techniques and Mul
No ratings yet
SQL Injection Detection Using Machine Learning Techniques and Mul
28 pages
Detection of SQL Injection Using Machine Learning: A Survey
No ratings yet
Detection of SQL Injection Using Machine Learning: A Survey
8 pages
4 Vol 100 No 15
No ratings yet
4 Vol 100 No 15
14 pages
SQL Injection Research Paper
No ratings yet
SQL Injection Research Paper
5 pages
Detection of SQL Injection Attack Using Machine Le
No ratings yet
Detection of SQL Injection Attack Using Machine Le
11 pages
Irjet Study On SQL Injection Techniques
No ratings yet
Irjet Study On SQL Injection Techniques
5 pages
Detection of SQL Injection Attack in Web Applications Using Web Services
No ratings yet
Detection of SQL Injection Attack in Web Applications Using Web Services
8 pages
Reference 1 - 2017
No ratings yet
Reference 1 - 2017
13 pages
SQL Injection & XSS Detection System
No ratings yet
SQL Injection & XSS Detection System
6 pages
A System For The Prevention of SQL Injection Attacks: Spring 2023
No ratings yet
A System For The Prevention of SQL Injection Attacks: Spring 2023
6 pages
SQL-CB-GuArd: A Deep Learning Mechanism For Structured Query Language Injection Attack Detection
No ratings yet
SQL-CB-GuArd: A Deep Learning Mechanism For Structured Query Language Injection Attack Detection
13 pages
Case Study On SQL Injection
No ratings yet
Case Study On SQL Injection
5 pages
Techreport
No ratings yet
Techreport
28 pages
Research On Threat Detection of SQL Injection Attacks in Large Scale Web Applications
No ratings yet
Research On Threat Detection of SQL Injection Attacks in Large Scale Web Applications
4 pages
SSRN Id3141112
No ratings yet
SSRN Id3141112
6 pages
A Study of Machine Learning-Based Approaches For SQL Injection Detection and Prevention
No ratings yet
A Study of Machine Learning-Based Approaches For SQL Injection Detection and Prevention
10 pages
15-SQL Injection Attack Prevention System
No ratings yet
15-SQL Injection Attack Prevention System
5 pages
A Study On SQL Injection Detection AI-based Perspective
No ratings yet
A Study On SQL Injection Detection AI-based Perspective
4 pages
SQL Injection Attack Detection and Prevention Techniques To Secure Web-Site
No ratings yet
SQL Injection Attack Detection and Prevention Techniques To Secure Web-Site
5 pages
SQL Injection Attack Detection and Preve PDF
No ratings yet
SQL Injection Attack Detection and Preve PDF
12 pages
Prevention of SQL Injection Attacks by Using Service Oriented Authentication Technique
No ratings yet
Prevention of SQL Injection Attacks by Using Service Oriented Authentication Technique
5 pages
SQL Injection Detection via Machine Learning
No ratings yet
SQL Injection Detection via Machine Learning
2 pages
SQL Injection Thesis Support
No ratings yet
SQL Injection Thesis Support
6 pages
LR
No ratings yet
LR
5 pages
10-SQL Injection Attacks Countermeasures Assessments
No ratings yet
10-SQL Injection Attacks Countermeasures Assessments
12 pages
4249-Article Text-19284-2-10-20240727
No ratings yet
4249-Article Text-19284-2-10-20240727
9 pages
Detection of SQL Injection Attacks
No ratings yet
Detection of SQL Injection Attacks
6 pages
SQL-Injections: A Wake-Up Call For Developers
No ratings yet
SQL-Injections: A Wake-Up Call For Developers
41 pages
Assignment 1 - Nguyen Van Huy Quang - 105027350
No ratings yet
Assignment 1 - Nguyen Van Huy Quang - 105027350
22 pages
Detecting and Fixing SQL Injection and Cross-Site Scripting Vulnerabilities in Web Applications
No ratings yet
Detecting and Fixing SQL Injection and Cross-Site Scripting Vulnerabilities in Web Applications
7 pages
An Analysis of AI-based SQL Injection SQLi Attack Detection
No ratings yet
An Analysis of AI-based SQL Injection SQLi Attack Detection
5 pages
Preventing SQL Injection in Web Apps
No ratings yet
Preventing SQL Injection in Web Apps
9 pages
Information Security Analysis and Audit CSE3501: Slot: G1+TG1
No ratings yet
Information Security Analysis and Audit CSE3501: Slot: G1+TG1
31 pages
08-A Survey On Detection and Prevention of SQL and NoSQL Injection Attack On Serverside Application
No ratings yet
08-A Survey On Detection and Prevention of SQL and NoSQL Injection Attack On Serverside Application
8 pages
Sea Waf The Prevention of SQL Injection Attacks On Web 3ttm76whmw
No ratings yet
Sea Waf The Prevention of SQL Injection Attacks On Web 3ttm76whmw
7 pages
(IJETA-V7I5P10) :dr. A. Poomari
No ratings yet
(IJETA-V7I5P10) :dr. A. Poomari
7 pages
Hybrid SQL Injection Detection System
No ratings yet
Hybrid SQL Injection Detection System
5 pages
A Study of SQL Techinque PDF
No ratings yet
A Study of SQL Techinque PDF
12 pages
Machine Learning-Based Detection of SQL Injection and Data Exfiltration Through Behavioral Profiling of Relational Query Patterns
No ratings yet
Machine Learning-Based Detection of SQL Injection and Data Exfiltration Through Behavioral Profiling of Relational Query Patterns
15 pages
Astudyonsqlinjection Ijpt
No ratings yet
Astudyonsqlinjection Ijpt
12 pages
Machine Learning-Based Detection and Mitigation of XML SQL Injection Attacks
No ratings yet
Machine Learning-Based Detection and Mitigation of XML SQL Injection Attacks
7 pages
A Comprehensive Guide To SQL Injection Prevention 1 6
No ratings yet
A Comprehensive Guide To SQL Injection Prevention 1 6
7 pages
AI-enabled Natural Language Processing For Prediction of Malicious SQL Codes
No ratings yet
AI-enabled Natural Language Processing For Prediction of Malicious SQL Codes
11 pages
07-Overview of SQL Injection Defense Mechanisms
No ratings yet
07-Overview of SQL Injection Defense Mechanisms
4 pages
Literature Review On SQL Injection
100% (2)
Literature Review On SQL Injection
8 pages
SQL Injection Vulnerabilities Analysis
No ratings yet
SQL Injection Vulnerabilities Analysis
6 pages
Vulnerability Playbook
No ratings yet
Vulnerability Playbook
14 pages
Sqligot: Detecting SQL Injection Attacks Using Graph of Tokens and SVM
No ratings yet
Sqligot: Detecting SQL Injection Attacks Using Graph of Tokens and SVM
42 pages
Final Project Synopsis
No ratings yet
Final Project Synopsis
26 pages
SQL Injection Detection and Prevention: Mohammad Abu Kausar, Mohammad Nasar, Aiman Moyaid
No ratings yet
SQL Injection Detection and Prevention: Mohammad Abu Kausar, Mohammad Nasar, Aiman Moyaid
8 pages
Final Report
No ratings yet
Final Report
48 pages
Memory Forensics in Cybersecurity
No ratings yet
Memory Forensics in Cybersecurity
11 pages
JavaScript Essentials for Developers
No ratings yet
JavaScript Essentials for Developers
33 pages
Microsoft PL-900 Exam - Questions and Answers - CertLibrary - Com-Pg8
No ratings yet
Microsoft PL-900 Exam - Questions and Answers - CertLibrary - Com-Pg8
8 pages
Ishan Jain Resume
No ratings yet
Ishan Jain Resume
1 page
UNIT 4 Forensics 11
No ratings yet
UNIT 4 Forensics 11
18 pages
Operating System: Bahria University, Islamabad
No ratings yet
Operating System: Bahria University, Islamabad
10 pages
SAP System Start and Stop Procedures
No ratings yet
SAP System Start and Stop Procedures
2 pages
Understanding Retrieval-Augmented Generation (RAG)
No ratings yet
Understanding Retrieval-Augmented Generation (RAG)
12 pages
DWM Unit-Ii Notes
No ratings yet
DWM Unit-Ii Notes
27 pages
Implementing Failover Clustering With Windows Server 2016 Hyper-V
No ratings yet
Implementing Failover Clustering With Windows Server 2016 Hyper-V
31 pages
Modern Compiler Design Java Tutorial
No ratings yet
Modern Compiler Design Java Tutorial
54 pages
JavaScript Execution Context - How JS Works Behind The Scenes
No ratings yet
JavaScript Execution Context - How JS Works Behind The Scenes
18 pages
How To Connect HTML To Database With MySQL Using PHP
No ratings yet
How To Connect HTML To Database With MySQL Using PHP
15 pages
MuleSoft Certified Integration Architect - Level 1
No ratings yet
MuleSoft Certified Integration Architect - Level 1
2 pages
Data Solutions for Business Growth
No ratings yet
Data Solutions for Business Growth
2 pages
Configuring LDAP Connector in Compliant User Provisioning of GRC Access Control
No ratings yet
Configuring LDAP Connector in Compliant User Provisioning of GRC Access Control
11 pages
DataRobot Engineer in Ukraine
No ratings yet
DataRobot Engineer in Ukraine
2 pages
Database Assignment#01
No ratings yet
Database Assignment#01
9 pages
Free Questions For: E - S4HCON2023
No ratings yet
Free Questions For: E - S4HCON2023
8 pages
Ob1..oops Concepts of Abap
No ratings yet
Ob1..oops Concepts of Abap
3 pages
REST API Security Cheat Sheet - 1714665717010
No ratings yet
REST API Security Cheat Sheet - 1714665717010
16 pages
Stanford - Slides Mapreduce
No ratings yet
Stanford - Slides Mapreduce
76 pages
Visual Basic and MS Access Project Report in Electricity Billing System
85% (60)
Visual Basic and MS Access Project Report in Electricity Billing System
107 pages
CMMI Overview and Benefits
No ratings yet
CMMI Overview and Benefits
22 pages
Oracle Database Design Template
No ratings yet
Oracle Database Design Template
27 pages
Ilide - Info Database Security and Privacy Unit II PPT PR
No ratings yet
Ilide - Info Database Security and Privacy Unit II PPT PR
80 pages
CRM With Dashboard Dummy Data
No ratings yet
CRM With Dashboard Dummy Data
4 pages
MS SQL Reporting Services 2005
No ratings yet
MS SQL Reporting Services 2005
51 pages
Managing Vmax PDF
No ratings yet
Managing Vmax PDF
32 pages
Unit 3 Assessment - Attempt Review - Saylor Academy
No ratings yet
Unit 3 Assessment - Attempt Review - Saylor Academy
13 pages

TJ 15 2021 1 112-120

Uploaded by

TJ 15 2021 1 112-120

Uploaded by

ISSN 1846-6168 (Print), ISSN 1848-5588 (Online) Preliminary communication

Ensemble Machine Learning Approaches for Detection of SQL Injection Attack

1 INTRODUCTION • When hostile data is used to retrieve sensitive data from

112 TECHNICAL JOURNAL 15, 1(2021), 112-120

UNION query into parameter that happens to be weak hence

2.4 Stored Procedures

This type is used to execute remote commands, perform

2 BACKGROUND 2.5 Illegal/Logically Incorrect Queries

2.3 Union Query 2.8 End of Line Comment

TEHNIČKI GLASNIK 15, 1(2021), 112-120 113

114 TECHNICAL JOURNAL 15, 1(2021), 112-120

Table 1 Description of features of dataset

the keywords perform the unexpected tasks. So, there is a 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 2 2 0 1

TEHNIČKI GLASNIK 15, 1(2021), 112-120 115

5.1 Classification Report

116 TECHNICAL JOURNAL 15, 1(2021), 112-120

1 n The classification report of our proposed system is given

The confusion matrix of our algorithms is given below

Table 11 Confusion matrix of AdaBoost

Table 13 Confusion matrix of XGBoost

Table 14 Confusion matrix of LGBM

TEHNIČKI GLASNIK 15, 1(2021), 112-120 117

The comparative analysis for the research that has been

118 TECHNICAL JOURNAL 15, 1(2021), 112-120

6 CONCLUSION [5] OWASP. https://owasp.org/www-project-top-ten/2017/A1_

TEHNIČKI GLASNIK 15, 1(2021), 112-120 119

120 TECHNICAL JOURNAL 15, 1(2021), 112-120

You might also like