IEEETII CPS AcceptedVersion
IEEETII CPS AcceptedVersion
net/publication/339318866
CITATIONS READS
223 1,533
4 authors, including:
Bo Sun
ZTE Corporation
5 PUBLICATIONS 244 CITATIONS
SEE PROFILE
All content following this page was uploaded by Bilal Hussain on 24 February 2020.
Abstract—With the advent of 5G, cyber-physical systems 5G is currently being devised to be a disruptive technology
(CPSs) employed in the vertical industries and critical infrastruc- that will play a crucial role in providing communications for
tures will depend on the cellular network more than ever; making CPSs and to enable new services in the applied verticals (see
their attack surface wider. Hence, guarding the network against
cyber-attacks is critical not only for its primary subscribers but [4, Fig. 4] for an integrated 5G architecture). Consequently,
to prevent it from being exploited as a proxy to attack CPSs. this will increase their dependence on cellular networks [5],
In this study, we propose a consolidated framework, by utilizing which can also be inferred from use cases pertaining to the
deep convolutional neural networks (CNNs) and real network various verticals described in 5G infrastructure public private
data, to provide early-detection for distributed denial-of-service partnership (5G PPP) white papers [6]. 5G is anticipated to
(DDoS) attacks orchestrated by a botnet that controls malicious
devices. These puppet devices individually perform silent call, meet the stringent requirements of industries: ultra-reliability,
signaling, SMS spamming, or a blend of these attacks targeting low-latency, high density of connected devices, etc. [4], [7]. In
call, Internet, SMS, or a blend of these services, respectively, to fact, 5G will provide services for real-time and mission-critical
cause a collective DDoS attack in a cell that can disrupt CPSs’ applications: state estimation in smart grids [8], assisted over-
operations. Our results demonstrate that our framework can taking in smart vehicles [5], etc.
achieve higher than 91% normal and under-attack cell detection
accuracy. Various attacks [9], [10, Table 1] can be launched against
cellular networks that compromise availability, integrity, or
Index Terms—Cyber-physical system, 5G, deep learning, cy- confidentiality. Denial-of-service (DoS) attack targets the
bersecurity, convolutional neural networks (CNNs), artificial
intelligence, call detail record (CDR), DDoS attack. availability of network resources in a region and has potential
to raze a network: in case when there are multiple such attacks
in dispersed and well-synchronized manner called distributed
I. I NTRODUCTION denial-of-service (DDoS) attack [9], [10]. According to a
report by Verizon [11], DDoS attacks topped the list of most
YBER-PHYSICAL system (CPS) is a complex, large,
C and networked amalgam of sensors, actuators, and com-
puting nodes that monitor and control physical processes
frequent cybersecurity incidents of 2017. They can be devised
as beachhead or smoke screen for IT security experts, with
which some other objective(s) (for example, data breach) can
[1], [2]. Because of its highly intricate and heterogeneous be accomplished [11], [12]: in cellular networks, the DDoS
nature, contributed by both cyber and physical aspects, a CPS attack cannot just heavily affect the network and its legitimate
has many general and application-specific vulnerabilities that users but can have potential side effects in disrupting CPSs
can be exploited by an attacker to perform mischievous acts that heavily rely on the networks. Once compromised, cellular
[1, Sec. IV, V]. Since CPSs control physical processes, the networks can be exploited as powerful attack vectors against
consequence of an attack can be irreversible and disastrous CPSs; hence, strong measures should be taken such that they
depending on the severeness of the attack and application cannot be exploited as proxy to attack CPSs.
domain: industrial control system (ICS), smart city, intelligent Apart from a zero-day vulnerability, malicious user(s) can
transportation, etc. CPS innovations applied in vertical in- also exploit known vulnerabilities in the cellular network to
dustries/sectors (energy, automotive, eHealth, manufacturing, orchestrate DDoS attack—its mitigation in 4G network is
etc.) will potentially account for more than $82 trillion in yet an open issue [9]. Diverse individual attacks such as
economic activity by 2025 [3]; sabotaging a CPS equates to a silent call [13], signaling [14], and SMS flooding [15] attacks
significant bump on an economy and hence its security must (elaborated in Sec. I-A) can be staged by utilizing a network
be of paramount importance. of bots known as botnet (an overlay network comprising large
number of malware-infected mobile devices that can receive
Manuscript received June 27, 2019; revised September 24, 2019 and commands from a botmaster (cybercriminal)) to participate
January 13, 2020; accepted February 2, 2020. This work was supported in part
by the National Natural Science Foundation of China under Grant 61941119, in a collective DDoS attack [9], [10], [16]. These devices
in part by the ZTE Industry-Academic-Research Cooperation Funds, and in can be infected by utilizing SMS, email attachments, or other
part by the Fundamental Research Funds for the Central Universities, China. means to spread and inject malware [10] and this could be
Paper no. TII-19-2742. (Corresponding author: Qinghe Du.)
B. Hussain and Q. Du are with the School of Information and Commu- accomplished in different environments such as in device-to-
nications Engineering, Xi’an Jiaotong University, China, and are also with device (D2D) networks [17], etc. The severity of the threat
Shaanxi Smart Networks and Ubiquitous Access Research Center (e-mails: from botnets can be realized in [11, Fig. 17], that elucidates
[email protected]; [email protected]).
B. Sun and Z. Han are with ZTE Corporation (e-mails: global botnet breaches in 2017.
[email protected]; [email protected]). Artificial intelligence (AI) is a superset of machine learning
ACCEPTED VERISON FOR PUBLICATION IN A FUTURE ISSUE OF IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 2
(ML), itself a superset of more powerful algorithms and [13, Fig. 4]. During the procedure, initiated after a VoLTE-
techniques known as deep learning (DL) [18]. There is a supported device calls its victim, a number of messages
recent surge of interest and clamor, among researchers from are exchanged involving caller, callee, VoLTE server and
both academia [19] and industry/nation state [20], in utilizing gateways. In between, the network reserves recourses for the
AI/DL for the protection of CPSs—especially the ones uti- call before caller device sends a “session initiation protocol
lized in the critical infrastructures like smart grids, financial (SIP) Update” message that eventually enables the callee’s
networks etc.—against cyber-attacks. Proliferating success of device to ring—at this instance, the caller avoids sending
DL is not just visible in image recognition field [18] but in all the message to bypass the ringing. As a consequence and
walks of life, evident from the increased number of AI patent since the network had already reserved the resources to carry
applications granted worldwide, shown in [21, Fig. 1]. out the call, the victim’s device is compelled to be stuck in
Motivated by the above reasons, a novel DL (in particular, a high-powered radio recourse control (RRC) state without
convolutional neural network (CNN))-based framework for the callee’s knowledge [13]. An Android application called
the detection of the silent call, signaling, and SMS flooding VoLTECaller has also been developed for the demonstration
attacks that can collectively or individually cause a DDoS of this attack [25].
attack against 4G network infrastructure, disrupting services 2) Signaling attack: targets to overload a core network
not only for the subscribers but also for the cellular-dependent element (e.g., a gateway) that processes RRC-based signaling
CPS devices is proposed by us. As compared to many messages—exchanged among different network entities for the
computationally expensive content-based schemes that also purpose of efficient resource management [10]. During the
consequently compromise user privacy [22], [23], our solution attack, a malicious user requests for a bearer setup (known as
is lightweight because: 1) it is independent of any individual random access) to send data and have “Connected” state; after
activity (call, SMS or Internet)’s content; and 2) it leverages the (resource) allocation, it just waits until the timeout and
already-available call detail record (CDR) data in the cellular continuously repeats this process. This phenomenon generates
network (mainly utilized for customer billing purpose), instead huge signaling messages for the network entities to process: a
of depending on dataset that demands additional resources total of 24 messages are required for a bearer activation and
(observation time, computation, communication, etc.) for its deactivation [26]. Since a LTE/LTE-A user can initiate multi-
acquisition. CDRs contain measure of subscribers’ interactions ple bearers (max. 8), this amplifies the number of generated
with the network in a spatiotemporal manner that can infer messages.
normal cell’s behavior, and can be utilized to identify an under- 3) SMS flooding attack / SMS spamming (towards IP mul-
attack cell’s behavior [22], [24]. timedia subsystem (IMS)) attack: relies on security vulner-
This study makes the following prominent contributions: abilities stemmed from technology migration: from circuit-
1. Proposes a novel and consolidated framework for the switched (CS)-based network (3G) carrying SMS via control-
detection of silent call, signaling, SMS flooding and plane to IMS and packet-switched (PS)-based network (4G)
a blend of these attacks that ultimately cause a DDoS carrying SMS via data-plane. In this attack, huge amount of
attack across the cellular infrastructure. forged SIP/SMS messages are injected during a SIP session
2. Presents a scalable and expandable solution for the (initiated during a SMS exchange) between the device’s SMS
attacks detection by utilizing CNN, for which the client and IMS server [10], [27]; aiming to computationally
input images can be expanded to include a greater overload the IMS server.
number of cells without modifying the model.
3. Deploys a state-of-the-art very deep CNN model II. R ELEVANT W ORK
called residual network with 50 layers (ResNet-50) In literature, detections of the silent call, signaling, and SMS
and also introduces a relatively simple model called flooding attacks have been mostly considered individually
deep rudimentary CNN (DRC) model having 6 layers and to our knowledge, no consolidated framework has been
that yields better detection accuracy for most of the proposed for their detection. Therefore, we study each of
attack scenarios. them separately. For their detection, many studies in the past
Next, we describe the attacks we are dealing with through- have utilized content-based approaches in which the actual
out this study in the remaining portion of this section. Then, contents of the user activities (IP packets, SMS messages, etc.)
we summarize the relevant work in Section II followed by are analyzed [22], [23]. However, such techniques have high
a discussion on preliminaries to our proposed method in computation cost and might be infeasible in practical settings.
Section III. Then, we explain our framework’s implementation The severity of silent call attack in 4G LTE networks has
in Section IV. We subsequently elaborate the results and our been thoroughly discussed by Tu et al. in [13], and in their
framework’s performance evaluation in Section V. Finally, we recent extended work [25]. Ruan et al. [23] utilized game
discuss our results, future insights and draw the concluding theory to detect silent call attack by monitoring peak value
remarks in Section VI. and variation trend of traffic data volume. They claimed to
have a lightweight solution in contrast to the past studies that
A. Description of the Attacks proposed computationally-exhaustive content-based solutions.
1) Silent call attack: is launched by exploiting a fundamen- For signaling attacks, Bang et al. [28] proposed a detection
tal design flaw in voice over LTE (VoLTE, a voice solution scheme based on a hidden semi-Markov model by utilizing
proposed for 4G LTE network)’s call establishment procedure bearer wakeup packet generation rate in wireless sensor and
ACCEPTED VERISON FOR PUBLICATION IN A FUTURE ISSUE OF IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 3
Labels
actuator network (WSAN). The scheme requires training ex- Pre- CNN Post-
amples constructed from historical network data, that may processing
Test
Engine processing Legend
Image Under-attack
take several hours or days of observation to acquire. Bassil Training Cell ID(s)
examples Database
et al. [26] utilized number of bearer requests per user per
minute as a criterion to determine signaling attack: attack Raw CDRs
Papadopoulos et al. [22], [24]. In [22], they simulated SMS (i) Silent
Call Attack
flooding and also signaling attacks, and generated the synthetic Botmaster
Activity
File / day 1
Date Square ID Timestamp Country code Internet
SMS in SMS out Call in Call out
traffic
… … … … … … … … …
5053 5060 5061
01-11-13 4259 1383260400000 0 3.4719
01-11-13 4259 1383260400000 385 0.0510 4961
4953
01-11-13 4259 1383260400000 39 8.2158 6.8014 3.7738 6.4455 261.6293
01-11-13 4259 1383260400000 44 0.0510 0.0510 4853 4861
01-11-13 4259 1383260400000 48 Activity 0.0510 File / day 2
… Date … Square ID … Timestamp … Country code
… … … … … Internet 4753 4761
SMS in SMS out Call in Call out
traffic
… … … … … … … … … 4653 4661
02-11-13 4259 1383346800000 0 0.1233
4553 4561
02-11-13 4259 1383346800000 39 5.5328 3.8728 1.1337 2.3449 117.5505
02-11-13 4259 1383346800000 421 0.0180
4453 4456 4461
02-11-13 4259 1383346800000 49 0.0510
02-11-13 4259 1383347400000 0 0.4433 Activity
0.3923 4361
File / day 62 4353
… Date … Square ID … Timestamp … Country code
… … … … … Internet
SMS in SMS out Call in Call out
traffic 4253 4259 4261
… … … … … … … … …
01-01-14 4259 1388530800000 20 0.3312
01-01-14 4259 1388530800000 221 0.0180
01-01-14 4259 1388530800000 351 0.3923 0.3923
01-01-14 4259 1388530800000 385 0.0510
01-01-14 4259 1388530800000 39 25.0201 26.8948 17.5864 15.7316 100.1597
… … … … … … … … …
(ii)
(i)
Day 62 Total 1,116 Images & Labels
Square User activity values:
ID: 5053 70% 30%
SMS out
18 time slots
Internet 781 Normal 335 Abnormal
6 time slots Images & Labels Images & Labels
Call out 24-hr
timeline Day 1
12 1 11 12 1 2 11
am am am pm pm pm pm
614 167 167 168
Normal (0) or Square
ID: 4254 18 x 62 = 1,116 time slots* TRAIN SET TEST SET
Under-attack (1)
781 Images & Labels 335 Images & Labels
9x9x3 81 x 1 Dimension * Each timeslot yields one image
Dimension Image Label
(iii) (iv) (v)
Fig. 2. Preliminaries: (i) Samples of raw CDRs, distributed among 62 files (shown as different tables), each having a single day’s data. The red-highlighted
values are, irrelevant to our research and hence, discarded. (ii) Overlay of the chosen 9 × 9 sub-grid with Milan’s map, using the GPS coordinates. It represents
81 cell IDs whose CDRs are extracted to form a labeled image, as shown in (iii). Each pixel value of the image corresponds to the three user activity values
of the corresponding cell ID. (iv) Illustration of the total number of time slots used for the purpose of data aggregation to create a total of 1, 116 images.
These images are combined in a way, depicted in (v), to create train and test sets.
the test image. Finally, the identified cell ID(s) is then passed Practical networks deal with normal scenarios more often as
on to the CN for the necessary actions. compared with the abnormal ones (anomalies), this is reflected
in our model as the normal instances are chosen to be in larger
B. Data Synthesis and Splitting quantity (70%) than the abnormal ones (30%). For the purpose
Data-hungry DL models require large number (hundreds or of modification, we randomly choose about 50% cell IDs in
even thousands) of examples for training and we only have each image and also change their labels to 1 (under-attack).
62 images for every 10-min slot in a 24-hour timeline, each As shown in Fig. 2(v), our train set contains 781 images
[0] [0] [0]
belonging to a single day. To overcome this limitation, we (70% of the total) Itrain ∈ R781×nH ×nW ×nC , and their
adopt the method utilized in [32] by combining all the images corresponding labels Otrain ∈ R781×81 , and the test set
[0] [0] [0]
created during a 3-hours range from 11 am - 2 pm of 2 months contains the remaining ones: Itest ∈ R335×nH ×nW ×nC and
period—6 images per hour × 3 hours × 62 days = 1, 116 Otest ∈ R335×81 . Out of the 781 labeled images in train set,
images, as illustrated in Fig. 2(iv)—and considering them as 614 are normal and the remaining 167 are the modified images.
images related to a single 10-min slot. These images, having Similarly, out of the 335 labeled images in the test set, 167
81 cells’ data in each, demonstrate a normal behavior; hence are normal and the remaining 168 are the modified images.
each cell is labeled as 0 (normal) in the labeled output of Note, for each attack scenario, we have a separate train and
each image o(j) ∈ R81×1 (see Fig. 2(iii) for the illustration of test sets because of the modifications discussed previously.
corresponding label of an image).
For an image exhibiting behavior influenced by an attack,
we would ideally attack an operational 4G network and notice
C. Performance Metrics and Software Utilized
the changes in the recorded CDRs—potentially resulting in,
an economically and legally unfeasible, network breakdown. We utilize the following common metrics, which are widely
We hence reserve a set of randomly chosen 335 images (30% used in literature such as [33], for the performance evaluation:
of the total) and for each attack scenario, we utilize the set accuracy, error rate, precision, recall, false positive rate (FPR),
to modify the relevant user activity (call out, Internet or SMS and F1 (weighted harmonic mean of the precision and recall).
out) values according to Appendix A to mimic the effect of We use MATLAB and Keras (Python’s DL library) for the
the attack (silent call, signaling or SMS spamming) on CDRs. preprocessing, GPS mapping, and building the CNN models.
This step is inspired from [22] in which the authors utilize We perform experimentation using a commercial PC (i7-
simulation software to mimic the effect of different attacks 7700T CPU, Windows 10 64-bit operating system, and 16GB
and generate a synthetic CDR dataset for their experiments. RAM) with an in-built GPU (NVIDIA GeForce 930MX).
ACCEPTED VERISON FOR PUBLICATION IN A FUTURE ISSUE OF IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 5
IV. R EALIZATION OF CNN M ODELS It is applied between convolution operation and the activation
A. Generic Architecture function.
2) Pooling layer: utilizes a max or avg function to pool
CNN has the following three fundamental building blocks:
maximum or average numbers, respectively, from groups of
1) Convolution layer: processes images or previous layer’s
[l−1] [l−1] [l−1] its input (and from each channel, independently) depending
activations A[l−1] ∈ Rm×nH ×nW ×nC , having m as on the kernel size k, to generate the output volume. This
the total number of images in the (train or test) dataset reduces requirement for storing parameters and improves
and l as an index of the present layer; and kernels K [l] ∈ model’s computational efficiency [31, Sec. 9.3]. If the input
[l] [l] [l−1] [l] [l−1]
Rk ×k ×nC ×nC , having k [l] × k [l] × nC as the single has nH × nW × nC dimension, the output’s dimension can be
[l]
kernel’s dimension with k [l] as the kernel size, and nC as derived using Eq. 2 with p = 0: b nHs−k +1c×b nWs−k +1c×nC .
the total number of kernels. To demonstrate a convolution 3) Fully connected layer: has the same purpose as of a
layer’s functionality, we focus on an example highlighted in feed-forward neural network’s hidden layer [32], having each
the red box of Fig. 5. Here, the index of present (output) neuron connected with all other previous layer’s neurons.
layer l is 2, while the previous (input) layer’s index will be
l − 1 = 1. We also consider a single image as an example; B. Residual Network Model
hence, m = 1. The input activations will then be represented
[1] [1] [1] [1] [1] The fundamental building blocks can be utilized in multiple
as A[1] ∈ R1×nH ×nW ×nC , having nH = nW = 13 and
[1] settings (with different number of layers and the way they are
nC = 3. The dimension of input activations is 13 × 13 × 3, as linked together) to create various CNN models like residual
shown in the figure. Additionally, the kernels are represented network comprising 50 layers (ResNet-50) [36], illustrated
[2] [2] [1] [2]
as K [2] ∈ Rk ×k ×nC ×nC , having a kernel size k [2] = 2 in Fig. 3. It is one of the most advanced CNN models that
[2]
and total number of kernels nC = 8. A single kernel’s we utilize in this study. Residual networks are effective in
dimension is 2 × 2 × 3. dealing with the problems encountered by a typical (very)
The convolution layer applies convolution operation be- deep neural network—gradient exploding or vanishing [31]
tween the input activations and each kernel separately, as and degradation [36] problems—by adopting residual learning
shown in the figure. A general convolution operation between in which residual blocks are extensively used.
input and a single kernel is demonstrated in [31, Fig. 9.1]. We first elaborate functioning of a residual block using
The output from each operation is then added with bias (a real Fig. 4 (top). In the figure, the information flows from input
number) and a non-linear activation function called Swish is a[l] to the output activation a[l+2] via two different paths.
also utilized. In the downward path, known as main path, there are two
Swish is a gated version of sigmoid function which has parts. The information first goes through the initial part having
some desirable properties that even the widely-used and most three modules consisting of a convolution layer, batch nor-
successful activation function like rectified linear unit (ReLU) malization, and a non-linear activation function, respectively;
lacks: non-monotonicity and smoothness [34]. The inventors of governed by the following standard equations:
Swish function claim that it yields matching or outperforming
results as compared with ReLU for deeper neural networks. z [l+1] = W [l+1] a[l] + b[l+1] (3)
Mathematically, it is defined as: a[l+1] = g(z [l+1] ) (4)
g(z) = z × σ(z) (1) where, W [l+1] is the weight matrix, b[l] is the bias vector, g(.)
−z −1
where, σ(z) = (1 + e ) is the sigmoid function. is the non-linear activation function, a[l] is the input, and a[l+1]
Finally, the layer piles up each result on top of one another is the output of the first part. The batch normalization module
[l] [l] [l]
to create an output A[l] ∈ Rm×nH ×nW ×nC which is repre- is added to accelerate the training.
[2] [2] [2] [2]
sented as A[2] ∈ R1×nH ×nW ×nC . The height nH or width Similarly, the modules in the second part are governed
[2] by the following equations (ignoring the other path and an
nW are computed by using:
addition operation):
[l−1]
[l]
nH/W + 2p[l] − k [l] z [l+2] = W [l+2] a[l+1] + b[l+2] (5)
nH/W = b + 1c (2)
s[l]
a[l+2] = g(z [l+2] ) (6)
having, p[l] as number of zero-padding (a technique used to
insert zeros around the input image’s edge to prevent shrinking In residual networks, a[l] is fast-forwarded to a deeper
of output dimension during the convolution operation [31, hidden layer in the neural network where it is added with the
Sec. 9.5]) and s[l] as stride (distance between consecutive output of that layer before applying a non-linear activation
application of kernel on the input). For this example, the zero- function. This is known as a short-cut connection, as shown
padding is already previously performed (see Fig. 5 (bottom)), in the figure. Hence, Eq. 6 will be modified as follows:
hence p[2] = 0 and s[2] is given as 1. By utilizing Eq. 2, we
[2] [2] a[l+2] = g(z [l+2] + a[l] ) (7)
can calculate nH = nW = 12. Hence, the output’s dimension
will be 12 × 12 × 8, which can also be observed in the figure. The addition of a[l] makes it a residual block. Here, we are
In addition, batch normalization (BN) [35] technique is assuming that the dimensions of both, input a[l] and z [l+2]
utilized to boost training speed and make the model robust. (and therefore output a[l+2] ) are same in order to perform the
ACCEPTED VERISON FOR PUBLICATION IN A FUTURE ISSUE OF IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 6
Input
Image
Conv Block ID Block
9X9X3
Output
Fig. 3. Residual network architecture having 50 layers (ResNet-50). Red annotations denote the utilized hyperparameters while the blue annotations illustrate
the output dimensions of the layers. Note, the displayed output dimensions in the Conv and ID blocks are for stage 2 only.
Swish
Conv
Conv
a[l] a[l+2]
BN
BN
layer. The main path contains 3 parts. The first part has
Main path convolution layer having k = 1, nC = K1 = 64, and s = 1. It
Shortcut connection outputs volume with same dimensions as of the input’s. The
convolution layer in the second part also results output with
same dimension as of the input’s, because it is utilizing “same”
a[l+1] a[l+2] convolution (in which padding is set so that the output’s
a[l] a[l+2] dimension remains same as of the input’s). The third part
having a convolution layer with k = 1, nC = K3 = 256, and
Main path
Fig. 4. Residual blocks. (Top) Identity (ID) block. (Bottom) Convolutional s = 1 will transform the input’s dimension from 3 × 3 × 64
(Conv) block. to 3 × 3 × 256. Finally, convolution layer in the shortcut
connection, that has input volume of dimension 3 × 3 × 64,
addition. This kind of residual block is known as identity (ID) scales up the input’s dimension to 3 × 3 × 256 by utilizing
block. If the dimensions of input (a[l] ) and output activations the following parameter values k = 1, nC = K3 = 256, and
(a[l+2] ) do not match then a convolution layer in the shortcut s = 1. The outputs from both convolution layers (one in the
connection is inserted to resize the input a[l] to a different shortcut connection and the other in third part of the main
dimension, so that the dimensions match up in the final path) can be added as they are now compatible: have same
addition. This type of residual block is known as Convolutional dimensions.
(Conv) block, as shown in the Fig. 4 (bottom). Note, we utilize The ID blocks of Stage 2 have similar function as of
residual blocks that skips 3 hidden layers in our paper instead the above-mentioned Conv block, with the exception of the
of skipping 2 hidden layers as delineated in the figure. shortcut connection’s design that does not have any layer
In ResNet-50 model, residual blocks are piled up on top in it. This is because the input of the ID blocks has same
of one another (see Stage 2 − 5 in Fig. 3 (left)) to grant dimension as of the output of convolution layer in it’s third
activations of one layer to skip some layers and be directly fed part: 3 × 3 × 256; hence, convolution layer is not needed in
to the deeper layers. During back-propagation, shortcut con- the shortcut connection.
nections also allow a gradient to be directly back-propagated The rest of the stages (Stage 3 − 5) follow a similar pattern
to the previous layers. As can be seen in the figure that the as above and ultimately yield a resultant volume of dimension
input image having dimension 9 × 9 × 3 is zero-padded with 1 × 1 × 32. It is then flattened in the form of an array and
padding p = 5 to have an output volume with dimension passed on to a final fully-connected layer (50th layer) to be
19 × 19 × 3 (Eq. 2 can be utilized in calculating the output processed as a 81×1 dimension output vector carrying normal
dimension of various layers). The resultant volume is then and under-attack cell IDs. The hyperparameters used in our
passed to Stage 1 having a convolution layer with kernel size model and the above-described dimensions of various layers
k = 7, total number of kernels nC = 64, and stride s = 2; from input layer to the layers utilized in Stage 2 can be found
that converts the dimension to 7 × 7 × 64. Finally, pooling in Fig. 3 in the form of red and blue annotations, respectively.
layer (Max Pool) having k = 3 and s = 2 yields the output A softmax function [31, Sec. 4.1] is typically utilized in the
ACCEPTED VERISON FOR PUBLICATION IN A FUTURE ISSUE OF IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 7
…...……..
13 X 13 X 3 12 X 12 X 8
(a) Silent call attack scenario: Deep Rudimentary CNN (DRC) model's accuracy distribution (b) SMS flooding attack scenario: DRC model's accuracy distribution (c) Signaling attack scenario: DRC model's accuracy distribution
(d) Blended attack scenario: DRC model's accuracy distribution (e) Blended attack scenario: ResNet-50 model's accuracy distribution (f) Blended attack scenario: Difference between ResNet-50 and DRC models
Fig. 7. Accuracy distributions of our models under various attack scenarios. (a)-(d) Results of our DRC model (e) ResNet-50 model’s performance for blended
attack scenario, in which it outperformed our model. (f) Improvements we achieved with ResNet-50 model under the blended attack scenario.
distinguish the hidden pattern and hence detected normal and The benefit of such setting resides in dividing computation-
under-attacked cell(s) with high accuracies; in contrast to the intensive tasks across the network (among MEC servers),
relatively low accuracy values for the cell covering nightlife easing computation and storage for the core network. By
places (ID 4456: row 3, column 4) that has relatively low user leveraging voice CDRs, our work can be extended to detect
activity values during the selected timings. overcharging attacks that can potentially be engineered to
launch DDoS attacks [25].
VI. C ONCLUSION AND I NSIGHTS FOR FUTURE WORK Our robust framework can perform simultaneous analysis
on multiple cells, depending on the size of sub-grid, due to
Our framework achieved higher than 91% normal and the inherent utilization of CNN architecture. It can be scaled-
under-attack cell detection accuracy by utilizing deep rudi- up to consider a larger sub-grid; however, the computation
mentary CNN (DRC) model for silent call, signaling, and requirements need to be investigated keeping in view the
SMS flooding attacks that target a cellular network to cause on-line and off-line settings. As we had a limited dataset,
DDoS to the cellular connectivity-dependent legitimate de- we combined 3 hours data of 62 days and considered it as
vices, including the ones utilized in the CPSs, as described past data belonging to a 10-min slot (explained thoroughly
in Section I. The framework also attained higher than 97% in Sec. III); in practice, historical CDR dataset is maintained
accuracy for a more sophisticated blended attack, in which for record-keeping within the cellular network and may easily
each puppet device performs all the three attacks, by using be acquired. They might also yield improved results as the
ResNet-50 model. Our results suggest that for an individual model would learn from the data containing same temporal
attack, where its effect is limited to a single user activity value characteristics (one 10-min slot instead of 18 slots).
modification in the CDRs, our framework employing DRC
model can more effectively detect the cell ID(s) under attack Since many devices, including the ones utilized in CPSs,
as compared with utilizing a ResNet-50 model. While for the depend on cellular infrastructure and its services for connectiv-
blended attack ResNet-50 model can yield better accuracy due ity—for example, IoT devices use voice services [25], wireless
to its very deep neural network design that can effectively learn sensor and actuator network devices rely on Internet services
the intricate structure in the dataset. [28], and machine-to-machine (M2M) communication network
Upon detection, the information can then be sent from devices utilize SMS services [15]—our research is compatible
our coarse-grained analysis framework to the CPSs to trigger as our framework leverages each service’s usage data, and
defensive/mitigative measures and can also be utilized to has solid applications in their security and earlier detection of
further perform fine-grained analysis [10, Sec. VI. C.]. For DDoS attacks against them.
example, by acquiring more denser and richer under-attack In conclusion, this is a pioneering study that investigated
cell’s data including every user equipment’s data, and feeding the application of CNNs for the cellular network’s security in
them to a feed forward deep neural network. This would a coarse-grained manner to detect various attacks that lead to
heavily aid in identifying the bots/adversary devices within a DDoS (voice, Internet, and SMS) and achieved more than
a short time, such as in minutes—it usually takes a month for 91% accuracy—contributing to resolve an open issue of DDoS
most organizations to identify and clear the puppet devices [11, attack mitigation in cellular networks [9]. Besides the primary
Fig. 18]. Our work can naturally fit to support mobile edge subscriber devices, our study has solid implications in securing
computing (MEC) paradigm [40] in cellular networks having cellular-dependent CPS devices (utilized in vertical industries
MEC servers geographically located across the network and and critical infrastructures) against cellular DDoS attacks that
each server, co-located with a base station, monitoring cellular could serve as a beachhead or smoke screen to attack the CPS
activity of a sub-grid and running our proposed framework. infrastructure and disrupt its services.
ACCEPTED VERISON FOR PUBLICATION IN A FUTURE ISSUE OF IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 9
[19] C. S. Wickramasinghe, D. L. Marino, K. Amarasinghe, and M. Manic, Bilal Hussain [S’10] received the B.E. degree (First-
“Generalization of Deep Learning for Cyber-Physical System Security: class honours) in electrical engineering from Bahria
A Survey,” 44th Annu. Conf. IEEE Ind. Electron. Soc. (IECON), 2018, University, Pakistan in 2010 and M.Sc. degree in
pp. 745-751. information and communications engineering from
[20] Department of Defense, USA. (2019, Feb.). Summary of the University of Leicester, U.K. in 2011. He is cur-
2018 DoD artificial intelligence strategy. [Online]. Available: rently pursuing his Ph.D. degree in information and
https://media.defense.gov/2019/Feb/12/2002088963/-1/-1/1/ communications engineering from Xi’an Jiaotong
SUMMARY-OF-DOD-AI-STRATEGY.PDF University, China.
[21] E. Ernst, R. Merola, and D. Samaan, “The economics of artificial His broader research interests include applica-
intelligence: Implications for the future of work,” International Labour tions of artificial intelligence and big data analytics
Organization (ILO) Research Paper Series, Oct. 2018. in wireless communication systems (6G/5G mobile
[22] S. Papadopoulos, A. Drosou, and D. Tzovaras, “A Novel Graph-Based networks), mobile edge and fog computing, and cyber-physical systems
Descriptor for the Detection of Billing-Related Anomalies in Cellular security.
Mobile Networks,” IEEE Trans. Mobile Comput., vol. 15, no. 11, pp.
2655-2668, Nov. 2016.
[23] N. Ruan et al., “A Traffic Based Lightweight Attack Detection Scheme
for VoLTE,” in Proc. IEEE Global Commun. Conf. (GLOBECOM),
2016, pp. 1-6.
[24] S. Papadopoulos, A. Drosou, I. Kalamaras, and D. Tzovaras, “Be-
Qinghe Du [S’09-M’12] received his B.S. and M.S.
havioural Network Traffic Analytics for Securing 5G Networks,” in Proc.
degrees both from Xi’an Jiaotong University, China,
IEEE Int. Conf. Commun. Workshops (ICC Workshops), 2018, pp. 1-6.
and his Ph.D. degree from Texas A&M Univer-
[25] T. Xie, C. Li, J. Tang, and G. Tu, “How Voice Service Threatens
sity, USA. He is currently a Professor of School
Cellular-Connected IoT Devices in the Operational 4G LTE Networks,”
of Information and Communications Engineering,
in Proc. IEEE Int. Conf. Commun. (ICC), 2018, pp. 1-6.
Xi’an Jiaotong University, China. His research in-
[26] R. Bassil, A. Chehab, I. Elhajj, and A. Kayssi, “Signaling oriented denial
terests include mobile wireless communications and
of service on LTE networks,” in Proc. ACM Int. Symp. Mobility Manage.
networking with emphasis on security assurance in
Wireless Access, 2012, pp. 153-158.
wireless transmissions, AI-empowered networking
[27] G.-H. Tu, C.-Y. Li, C. Peng, Y. Li, and S. Lu, “New security threats
technologies, 5G networks and its evolution, cogni-
caused by IMS-based SMS service in 4G LTE networks,” in Proc. ACM
tive radio networks, Industrial Internet, Blockchain
SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 1118-1130.
and its applications, Internet of Things, etc. He has published over 100
[28] J.-H. Bang, Y.-J. Cho, and K. Kang, “Anomaly detection of network-
technical papers. He received the Best Paper Award in IEEE GLOBECOM
initiated LTE signaling traffic in wireless sensor and actuator networks
2007 and IEEE COMCOMAP 2019, respectively, and received the Best Paper
based on a Hidden semi-Markov Model,” Comput. Secur., vol. 65, pp.
Award of China Communications in 2017. He serves and has served as an
108-120, Mar. 2017.
Associate Editor of IEEE COMMUNICATIONS LETTERS and an Editor of
[29] M. S. Parwez, D. Rawat, and M. Garuba, “Big data analytics for
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS.
user-activity analysis and user-anomaly detection in mobile wireless
He serves and has served as a Technical Program Co-Chair for IEEE ICCC
network,” IEEE Trans. Ind. Informat., vol. 13, no. 4, pp. 2058-2065,
Workshop on Internet of Things (IoT) 2013-2017, a Track Co-Chair for
Aug. 2017.
IIKI 2015-2019, and the Publicity Co-Chairs for IEEE ICC 2015 Workshop
[30] G. Barlacchi et al., “A multi-source dataset of urban life in the city of
on IoT/CPS-Security, IEEE GLOBECOM 2011, ICST WICON 2011, and
Milan and the Province of Trentino,” Scientific Data, vol. 2, no. 150055,
ICST QShine 2010. He also serves and has served as the Technical Program
pp. 1-15, 2015.
Committee Members for many world-renowned conferences including IEEE
[31] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, Cambridge,
INFOCOM, GLOBECOM, ICC, PIMRC, VTC, etc.
MA, USA: MIT Press, 2016.
[32] B. Hussain, Q. Du, and P. Ren, “Deep Learning-Based Big Data-
Assisted Anomaly Detection in Cellular Networks,” in Proc. IEEE Glob.
Commun. Conf. (GLOBECOM), 2018, pp. 1-6.
[33] B. Hussain, Q. Du, and P. Ren, “Semi-Supervised Learning Based Big
Data-Driven Anomaly Detection in Mobile Wireless Networks,” China
Commun., vol. 15, no. 4, pp. 41-57, Apr. 2018. Bo Sun [M’17] achieved his B.S and M.S. degrees
[34] P. Ramachandran, B. Zoph, and Q. V. Le, “Swish: A Self-Gated both from Xi’an Jiaotong University, China. He is
Activation Function,” arXiv:1710.05941v1 [cs.NE], Oct. 2017. currently a senior specialist in wireless communica-
[35] S. Ioffe, and C. Szegedy, “Batch Normalization: Accelerating Deep Net- tions technology in ZTE Corporation. His research
work Training by Reducing Internal Covariate Shift,” arXiv:1502.03167 interests include mobile wireless communications
[cs.LG], Mar. 2015. and networking transmissions, short-range wireless
[36] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for communication technologies, WLAN, edge comput-
Image Recognition,” arXiv:1512.03385 [cs.CV], Dec. 2015. ing and AI-empowered wireless communications,
[37] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for 5G and future networking technologies, etc. He is
large-scale image recognition,” Proc. Int. Conf. Learning Representa- very active in IEEE wireless standard development.
tions (ICLR), 2015. Currently, he is the chair of IEEE 802.11 TGbd, also
[38] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification the PHY adhoc co-chair of IEEE 802.11 TGax.
with Deep Convolutional Neural Networks,” Advances Neural Inform.
Process. Syst. (NIPS), 2012, pp. 1097-1105.
[39] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based
Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no.
11, pp. 2278-2324, Nov. 1998.
[40] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A Survey Zhiqiang Han received his B.S. and M.S. degree
on Mobile Edge Computing: The Communication Perspective,” IEEE from Sichuan University, China. He is currently
Commun. Surveys Tut., vol. 19, no. 4, pp. 2322-2358, 4th Quart., 2017. a technical pre-research expert of ZTE Corpora-
[41] [Online] https://www.citypopulation.de/php/italy-lombardia.php?cityid= tion.His research interest include Wireless Local
015146 Area Network, Edge computing, internet of Things,
[42] [Online] https://data.worldbank.org/indicator/IT.CEL.SETS.P2?view= 5G networks,etc.
map&year=2014
[43] X. An, and G. Kunzmann, “Understanding mobile Internet usage behav-
ior,” IFIP Networking Conf., 2014, pp. 1-9.