0% found this document useful (0 votes)
70 views13 pages

Deep Learning for Smart Contract Safety

Research on Deep learning based Smart Contract in Internet of THings

Uploaded by

Jibrael Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views13 pages

Deep Learning for Smart Contract Safety

Research on Deep learning based Smart Contract in Internet of THings

Uploaded by

Jibrael Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Computers and Electrical Engineering 97 (2022) 107583

Contents lists available at ScienceDirect

Computers and Electrical Engineering


journal homepage: www.elsevier.com/locate/compeleceng

Deep learning-based malicious smart contract detection scheme for


internet of things environment✩
Rajesh Gupta, Mohil Maheshkumar Patel, Arpit Shukla, Sudeep Tanwar ∗
Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad, India

ARTICLE INFO ABSTRACT

Keywords: Smart contracts are essential in maintaining the trust between the members of the blockchain.
Deep learning Its verification is of utmost importance, as it is unmodifiable once deployed. Moreover, a
Smart contract malicious user can deploy vulnerable smart contracts to breach the blockchain data. To restrict
Blockchain
this, we propose a deep learning-based scheme to detect the vulnerabilities and rate them as
Internet of things
safe/vulnerable based on probability value < 0.5/≥ 0.5 respectively. An open Google BigQuery
LSTM
GRU
dataset with 7000 samples was used to train the classifier. We train artificial neural networks
ANN (ANN), long-short term memory (LSTM), and gated recurrent unit models (GRU) and compare
Malicious their accuracy, precision, recall, and receiver operating characteristic (ROC) curve values.
Results show the LSTM model outperforms ANN and GRU. Then, we simulate the LSTM to
classify the smart contracts before their deployment in the blockchain. Also, the efficacy of the
blockchain is justified with the proposed system’s data storage cost and scalability.

1. Introduction

The contracts play a crucial role in businesses by binding their stakeholders with a set of rules [1]. Contracts are the agreements
signed between the stakeholders of a system either in a written or oral form intended to be enforceable by law. The traditional
contracts were mostly written on a piece of paper along with the signature of all the agreeing entities on it [2]. Here, the time
required to prepare a document of the contract ranges from hours to several days, which also requires the physical presence of all
the entities for their signatures on it [3]. Moreover, the payment is made using an offline mode for the execution and remittance,
which lacks all the entities and automation.
Hence, the aforementioned flaws in the traditional systems can be mitigated with the advancements of digital or smart contracts.
Smart contracts lie at the core of blockchain technology, which contains business logic that automates the transactions between the
network participants [5]. The advancements in blockchain technology have led to upgrade data privacy and transparency in various
real-time applications of the digital world. The characteristics of smart contract developed over the blockchain are immutability,
autonomous, and self-enforcing, which enhances the system flow embedded over the blockchain [6]. Once the parties signed for the
contract, it can not be void. Similarly, if the smart contract is deployed, it can not be altered and their constraints stay immutable
with the decentralized network.
Fig. 1 shows the traditional process of smart contract in Ethereum blockchain, which is written in the solidity language [7]. Smart
contracts are compiled using the Ethereum virtual machine (EVM). After its successful compilation, EVM generates the bytecode

✩ This paper is for special section VSI-spiot. Reviews were processed by Guest Editor Dr. Bhaskar Mondal and recommended for publication.
∗ Corresponding author.
E-mail addresses: 18ftvphde31@nirmauni.ac.in (R. Gupta), 17bce062@nirmauni.ac.in (M.M. Patel), 18bce370@nirmauni.ac.in (A. Shukla),
sudeep.tanwar@nirmauni.ac.in, sudeep149@rediffmail.com (S. Tanwar).

https://doi.org/10.1016/j.compeleceng.2021.107583
Received 20 April 2021; Received in revised form 2 August 2021; Accepted 26 October 2021
Available online 20 November 2021
0045-7906/© 2021 Elsevier Ltd. All rights reserved.
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Fig. 1. Traditional flow of smart contract execution over Ethereum [4].

further distributed and deployed over blockchain, then it can not be altered and is ready to be used. Here, no vulnerability analysis is
carried out between the process [8]. Despite the aforementioned benefits of smart contracts, several vulnerabilities can be extremely
harmful if not checked before deployment.
There are mainly two types of vulnerabilities, namely logical and structural vulnerability. The logical vulnerabilities are
occurred due to the errors in the business logic implemented during the development phase of smart contracts. The transaction
ordering, re-entrancy bugs [9], zero division risk, time-stamp expansion and integer overflow and underflow are some of the logical
vulnerabilities found in the smart contract [10]. The other vulnerabilities are the structural vulnerabilities formed due to reasons
other than the logical issues due to their technological structure. The ether lost or transferred, TX.origin, generating randomness,
and immutable bugs are the major structural vulnerabilities of smart contract.
It is necessary to handle these vulnerabilities before the smart contracts are deployed in the distributed network because once they
get deployed, then they are immutable and can not be altered [11]. Hence, vulnerability analysis can solve this issue of traditional
frameworks. Table 1 shows a relative comparison of the proposed scheme with the traditional schemes. There exists several tools
that are used for static security analysis of the smart contracts such as truffle framework, MythX and Oyente [12,13]. However, these
tools are static and can capture limited vulnerabilities and must be kept manually updated by the developers [14]. To overcome this
problem, artificial intelligence can be embedded into the system where models can be trained and then a prediction can be carried
out from them. The deep learning models can be used to make systems more intelligent that can act as humans or sometimes
even better than humans [15]. Hence, a smart vulnerability prediction by specially trained models can protect the system from
vulnerabilities.
So, in the proposed scheme, we are classifying that the developed smart contract is safe from vulnerabilities or not. For that, we
train three different models, namely LSTM, ANN, and GRU and their prediction results are compared and choose the best performing
model for the proposed system.

1.1. Motivations

The motivation of this study are as follows.

1. The importance of smart contracts in a blockchain network is of utmost importance in eliminating the role of third party
systems in maintaining the trust between their participating members and settling down the financial disputes. The security
of smart contracts is crucial for their real-time deployment.
2. The existing literature mainly focused on static smart contract analysis, bug detection, or protect it from intrusion detection
with less emphasis on deep learning-based malicious (already affected) smart contract classification (malicious or safe). The
existing systems also not discussed the penalty or rewards given to the users/systems who deployed smart contracts.
3. Thus, there is a need to design a system that classifies the deployed smart contract as malicious or safe and accordingly gives
rewards/penalties to the users based on the classification of smart contracts.

2
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Table 1
Relative comparison of the proposed scheme with the state-of-the-are schemes.
Authors Year Objective Approach Dataset Cons
Zhou et al. [16] 2018 Presented a system to find the Topology diagram (signifies Self made dataset from Only for static analysis
security threats in the smart relationships) etherscan.io with 4744 smart and not able to predict
contract contracts the unknown
vulnerabilities
Tikhomirov et al. 2018 Presented a smart contract XML-based representation to Self made dataset from Only for static analysis
[17] analysis tool (static) for Ethereum analyze XPath patterns etherscan.io with 4600 smart and not able to predict
platform called SmartCheck contracts the unknown
vulnerabilities
Wang et al. [18] 2019 Presented a technique to identify Transaction irregularities NA Considered only the
the vulnerable smart contracts transaction
based on irregular transactions irregularities
due to security breach
Kongmanee et al. 2019 Presented a system to detect Analyze all execution patterns NA Manual analysis of
[14] smart contract security execution patterns is
vulnerabilities by exploring all quite hard
possible execution sequences
Tann et al. [19] 2019 Presented an LSTM-based LSTM classification model Google BigQuery Not compared with any
malicious smart contract detection other techniques
and identification of new attacks
The proposed 2020 Presented a classification model ANN, LSTM, GRU Google BigQuery NA
scheme malicious smart contract detection
and identification of new attacks

1.2. Research contributions

Following are the research contributions of this paper.

• To present a comparative analysis of the existing approaches on vulnerability assessment of smart contracts and proposed a
deep learning-based malicious smart contract classification scheme.
• We design a scheme for rewarding or penalizing the users based on the classification of a smart contract as safe or malicious.
• We compare the accuracy, precision, and recall of the proposed scheme by considering three classification algorithms as LSTM,
ANN, and GRU.

1.3. Paper structure

The rest of the paper is organized as follows. Section 2 presents a system model and problem formulation related to malicious
smart contract detection and classification. In Section 3, we discuss the proposed malicious smart contract detection scheme using a
deep learning algorithm. Section 4 performs the model simulations to check whether the given smart contract is safe or vulnerable. In
Section 5, we evaluated the performance of the proposed scheme by considering three different classification algorithms, i.e., ANN,
LSTM, and GRU and finally, Section 5 conclude the paper. Table 2 shows the list of abbreviations and symbols used in the paper.

2. System model and problem formulation

This section describes the system model and the problem formulation.

2.1. System model

Fig. 2 shows the system model to detect the deployed smart contracts that are safe to deploy in the blockchain network or not.
It consists of user entities {𝑢1 , 𝑢2 , … 𝑢𝑛 } ∈  who are responsible for the smart contract creation and its deployment (with their
unique ID’s). An 𝑢𝑖 can deploy multiple smart contracts {𝑠𝑖1 , 𝑠𝑖2 , 𝑠𝑖3 , … 𝑠𝑖𝑚 } ∈  for the settlement and resolution of different tasks.
Users can opt for the language for writing the smart contract based on their specializations such as solidity, java, kotlin, etc. Once
the 𝑢𝑖 deployed the smart contracts, then it will be passed through the AI model such as LSTM, ANN, or GRU for their security
verification. The AI model calculates the probability (P) of the smart contract being vulnerable and set the threshold value as 0.5
or 50%. If the probability value is ≥0.5, it will be considered safe, otherwise vulnerable. If the smart contract is safe, it is deployed
into the blockchain network; otherwise, it rejects its deployment.

3
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Table 2
Nomenclature.
 Bytecode
 Opcode
 Smart contract
 Group of users
𝑈𝑠𝑐 .𝑠𝑡𝑎𝑡𝑢𝑠 Status of smart contract
𝑖𝑠𝑡𝑎𝑡𝑢𝑠 Status of user
 Unregistered entities
 Fine generated
𝜎 Sigmoid function
𝑢 Funds to transfer
𝑊𝑖 Wallet credentials of 𝑖th user
ANN Artificial Neural Network
AUC Area Under Curve
b Bias in classification
𝐶𝑜𝑢𝑛𝑡𝑚𝑎𝑙 Count of malicious smart contracts
EVM Ethereum Virtual Machine
FN False Negative
FP False Positive
GRU Gated Recurrent Unit
ID Unique identity of each user
 Feture vector
LSTM Long Short Term Memory
P Probability of being vulnerable
ROC Receiver Operating Characteristic
TN True Negative
TP True Positive
𝑉ℎ𝑜𝑡 One-hot vector
W Classification weights

Fig. 2. The proposed system model.

4
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

2.2. Problem formulation

In the proposed scheme, we consider the user entity set {𝑢1 , 𝑢2 , … 𝑢𝑛 } ∈  who creates and deploys the smart contract . Each
𝑢𝑖 has a unique 𝐼𝐷𝑖 ∈ {𝐼𝐷1 , 𝐼𝐷2 , … 𝐼𝐷𝑛 } by which a 𝑢𝑖 can easily be tracked who deployed the . Every 𝑖th user (𝑢𝑖 ) can deploy
multiple smart contracts {𝑠𝑖1 , 𝑠𝑖2 , … 𝑠𝑖𝑚 } ∈  for various purposes. A counter X is associated with the ID of each user, i.e., 𝐼𝐷𝑖 ∶ 𝑋,
which signifies the number of times a 𝑢𝑖 deploys the malicious smart contract 𝑠𝑗 ∈ .

𝑛, 𝑖, 𝑗, 𝑚 ≥ 0 (1)
∃𝑆 ∈ {𝑢1 , 𝑢2 , … 𝑢𝑛 } ∈  (2)

Assume 𝑋 = 0 (or 𝐶𝑜𝑢𝑛𝑡𝑚𝑎𝑙 interchangeably) and a user 𝑢𝑖 can compile maximum of two malicious ’s. If 𝑢𝑖 tries to compile the
third malicious smart contract, it will be suspended from the blockchain network s.t.
{
𝑋 < 3, 𝑢𝑖 𝑐𝑎𝑛 𝑑𝑒𝑝𝑙𝑜𝑦 𝑚𝑜𝑟𝑒 
𝐼𝐷𝑖 ∶ 𝑋 = (3)
𝑋 >= 3, 𝑢𝑖 𝑠𝑢𝑠𝑝𝑒𝑛𝑑𝑒𝑑 𝑓 𝑟𝑜𝑚 𝑏𝑙𝑜𝑐𝑘𝑐ℎ𝑎𝑖𝑛
A user 𝑢𝑖 create the  and pass it to the data preparation layer for further processing. A  will be compiled to bytecode () and
opcode () both. The  is then encoded using the one-hot encoding (𝑉ℎ𝑜𝑡 ) and the resulting encoded vectors are compiled to form
a feature vector ( ) for the given 𝑠𝑗 s.t.

,  ≠ 𝑁𝑈 𝐿𝐿 (4)

The dimension of each 𝑉ℎ𝑜𝑡 is 256 × 1 because the maximum possible different opcodes are 256 only.
⎡1⎤ ⎡1⎤ ⎡1⎤
⎢0⎥ ⎢0⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0⎥ ⎢0⎥ ⎢0⎥
𝑉ℎ𝑜𝑡 = ⎢ . ⎥ … ⎢ . ⎥ … ⎢ . ⎥
256×1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢.⎥ ⎢.⎥ ⎢.⎥
⎢.⎥ ⎢.⎥ ⎢.⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣0⎦ ⎣0⎦ ⎣0⎦
The one-hot vector 𝑉ℎ𝑜𝑡 contains only binary values, i.e., 0,1 and the number of 𝑉ℎ𝑜𝑡 is depends upon the length of an  s.t.

𝑉ℎ𝑜𝑡 ∈ {0, 1}𝑡 (5)


𝑑𝑒𝑝𝑒𝑛𝑑𝑠_𝑜𝑛
𝑡 ←←←←←←←←←←←←←←←←←←←→
← 𝐿𝑒𝑛𝑔𝑡ℎ() (6)

where t is the number of 𝑉ℎ𝑜𝑡 which is represented as follows.



𝑡
𝑘
𝑉ℎ𝑜𝑡 = 𝑉ℎ𝑜𝑡 (7)
𝑘=1
𝑘 is the 𝑘th one-hot vector. Then, a feature vector  of size 256 × 1 is created using 𝑉
where 𝑉ℎ𝑜𝑡 ℎ𝑜𝑡 and then pass it to the classification
model  at the prediction layer. The feature vector contains the features of the dataset, which are represented as follows.
⎡𝑓1 ⎤
⎢𝑓 ⎥
⎢ 2⎥
⎢𝑓3 ⎥
 =⎢ . ⎥
256×1 ⎢ ⎥
⎢.⎥
⎢.⎥
⎢ ⎥
⎣ 𝑓𝑟 ⎦
where r is the number of features in the  matrix, which is having value 256 (the size of 𝑉ℎ𝑜𝑡 ). The  is trained over Google BigQuery
dataset with randomly selected 7000 records, which is publicly available. The  can be summarized as the following equation,
which will give the probability (P) of the smart contract being vulnerable or not.

𝑃 = 𝜎(𝑊 𝐹 + 𝑏) (8)

where 𝑊 is the weights trained in the , 𝜎(−−) is the sigmoid function, and 𝑏 is the bias. We have set the threshold value as 0.5
or 50%, which signifies that, if 𝑃 < 0.5, it will be considered as safe, otherwise vulnerable, s.t.
{
𝑠𝑎𝑓 𝑒, 𝑃 < 0.5
= (9)
𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The  receives input as a  from the data preparation layer and classify it as safe or vulnerable. If the  is safe, then it is deployed
into the blockchain network (), where we cannot further modify it. Otherwise, reject the  deployment and fine a penalty as

5
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Fig. 3. The proposed malicious smart contract detection scheme.

follows.
{
𝑑𝑒𝑝𝑙𝑜𝑦,  𝑖𝑠 𝑠𝑎𝑓 𝑒
= (10)
𝑟𝑒𝑗𝑒𝑐𝑡,  𝑖𝑠 𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒
In case of rejection, the penalty is given as follows.

 𝑢 ← 𝑔𝑒𝑡𝐹 𝑖𝑛𝑒(.𝑠𝑜𝑙, 𝑎𝑚𝑡, 𝑠𝑖 ) (11)


𝑏𝑖𝑛𝑑𝑒𝑑_𝑤𝑖𝑡ℎ
𝑢𝑖 ←←←←←←←←←←←←←←←←←←←←←←←  𝑢 (12)
𝑠𝑢𝑏𝑠𝑐𝑟𝑖𝑝𝑡

𝑢𝑖 .𝑡𝑟𝑎𝑛𝑠𝑓 𝑒𝑟( 𝑢) (13)


𝑢𝑠𝑡𝑎𝑡𝑢𝑠
𝑖 ← 𝑖𝑛𝑣𝑎𝑙𝑖𝑑 (14)
𝐼𝐷𝑖 ∶ 𝑋 ← 𝐼𝐷𝑖 ∶ 𝑋 + 1 (15)

where  𝑢 is the funds that the 𝑢𝑖 has to pay as a penalty if he or she deployed a malicious smart contract and increment its 𝑐𝑜𝑢𝑛𝑡𝑚𝑎𝑙
or 𝑋. Refer Eq. (3) to get to know about the further deployment of  in the . If the 𝑢𝑖 ’s 𝑋 count is <3, he/she can deploy as many
.

3. The proposed system

Fig. 3 presents the detailed deep learning-based scheme to detect malicious smart contracts in an IoT environment. The key
purpose of this scheme is to identify the malicious smart contracts and penalize the users who deployed them and suspend them
from deploying smart contracts for a specified period. The proposed scheme is logically divided into three verticals such as-(i)
deployment layer, (ii) data preparation layer, and (iii) prediction layer. The description of each vertical is as follows.

3.1. Deployment layer

This layer consists of organizations and users (also called developers) whose task is to create and deploy smart contracts into the
Ethereum blockchain network. Each user {𝑢1 , 𝑢2 , … 𝑢𝑛 } ∈  has a unique ID, i.e., {𝐼𝐷1 , 𝐼𝐷2 , … 𝐼𝐷𝑛 }. Each 𝐼𝐷𝑗 is associated with
a count value, representing the number of times a particular user tried to deploy the malicious smart contract. It is represented as
𝐼𝐷𝑗 ∶ 𝐶𝑜𝑢𝑛𝑡𝑚𝑎𝑙 .

∀𝑢𝑖 , ∃𝐼𝐷𝑗 (16)


{𝑛, 𝑖, 𝑗} ≥ 0 𝑎𝑛𝑑 {𝑖, 𝑗} ≤ 𝑛 (17)
𝐶𝑜𝑢𝑛𝑡𝑚𝑎𝑙 ≤ 3 (18)

The output of this layer, i.e., deployed smart contract, is forwarded to the data preparation layer, where it is compiled for
bytecode and generate a feature vector from it.

6
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Fig. 4. Sample bytecode.

Fig. 5. Sample opcode.

3.2. Data preparation layer

Any smart contract, which is to be deployed on blockchain follows a series of steps. Firstly, the smart contracts need to be written
in any of the preferred languages, i.e., Solidity, Golang, and java. It is then compiled into corresponding bytecode (Fig. 4 shows the
sample bytecode). The bytecode is then packed into a transaction, signed by deploying account, and then sent on the blockchain
network and mined.
In the Ethereum environment, the bytecode runs on EVM and is converted into opcodes, then interpreted by the machine. Each
opcode represents a specific operation in the environment and is of 1 byte each. So, a total of 256 different opcodes are possible.
Fig. 5 shows the sample opcode generated by EVM after converting the smart contract bytecode.
During the data preparation, the generated opcode (labelled code) is then converted into one-hot encoding. It is a process that
converts categorical (or numerical) values into a computer understandable form (0/1), which could be provided as an input to the
deep learning algorithms to get better prediction results. There exist some deep learning algorithms, which are not much capable
of processing the categorical values. So, there is a need for one single format, which is understandable to all the machine and deep
learning algorithms, i.e., one-hot encoding. After one-hot encoding, the vector is converted into a feature vector. It is a vector of
length 256, i.e., a total of 256 opcodes are possible. Each entry in the vector corresponds to the count of the respective opcode in
the smart contract. In this way, the feature vectors of each contract are generated. The generated feature vectors are fed into deep
learning models for classification in the prediction layer.

3.3. Prediction layer

In this layer, the smart contracts are classified as malicious or safe using machine learning algorithms such as LSTM, ANN, and
GRU. The generated feature vectors are fed as an input to this layer for classification and label the input opcodes as malicious or safe.
MAIAN is a smart contract analysis tool, which is utilized to generate the labels of smart contracts, i.e., malicious or safe [19]. If it is
safe, then deploy the smart contract into the blockchain network, otherwise reject it. It employs inter-procedural symbolic analysis
and concrete validator for exhibiting real exploits in contracts. Algorithm 1 represents the procedure for smart contract classification
as malicious or safe. It returns: (i) the set of safe smart contracts and (ii) the count value (number of times the same user deployed
the malicious smart contract) bound with the user ID’s to track their activities. Suppose the user deployed the malicious smart
contract into the blockchain network. In that case, they will be suspended for a specified period as a penalty (only if he deployed
three or more malicious smart contracts). Algorithm 2 depicts the procedure for reward or penalty for the users who deployed the
smart contracts [20]. Fig. 6 shows the workflow of the proposed scheme for the detection of malicious smart contracts.
The Ethereum blockchain (i.e., public blockchain) offers security and privacy, but its transaction storage cost is quite high. The
cost to store one word (i.e., 256 bits) in Ethereum quite high, which is derived as follows. In Ethereum, the gas needed to write in
Ethereum’s SSTORE is 20 K Gas, i.e., G = 20 K [21].
1 𝑤𝑜𝑟𝑑 = 20000 𝐺𝑎𝑠 (19)
To store 1 kB of data, the gas required is

1 kB = 25 × 20000 𝐺𝑎𝑠 (20)

7
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

The existing price of gas (G𝑝𝑟 ) and Ethereum (Z𝑝𝑟 ) are 185.4 gwei [22] and $1672.52 (as on 27-Mar-2021) respectively. So, the cost
to store 𝑙-words in Ethereum is

1 𝐸𝑡ℎ𝑒𝑟𝑒𝑢𝑚 = 109 𝑔𝑤𝑒𝑖 (21)


9
𝐸𝑡ℎ𝑒𝑟𝑒𝑢𝑚𝑙−𝑤𝑜𝑟𝑑 = (𝑙 × G)∕(10 ) (22)

So, the cost to store 𝑙-words in USD is

𝑈 𝑆𝐷𝑙−𝑤𝑜𝑟𝑑 = (G𝑝𝑟 × 𝐸𝑡ℎ𝑒𝑟𝑒𝑢𝑚𝑙−𝑤𝑜𝑟𝑑𝑠 ) × Z𝑝𝑟 (23)

Algorithm 1 Smart Contract Classification


Input:  , , 
Initialization:
 : {𝑢1 , 𝑢2 , 𝑢3 , … 𝑢𝑛 } (Set of users)
 𝑖 : {𝑠𝑖1 , 𝑠𝑖2 , 𝑠𝑖3 , … 𝑠𝑖𝑚 } (Set of smart contracts from 𝑚𝑡ℎ user)
: Model for classification
Output: U, S
U = Map containing users and the count of malicious smart contract deployed by each user
S = Map containing smart contract and its label (0=safe, 1=vulnerable)
1: procedure classify( , )
2: for each 𝑢𝑖 in  do
3: 𝑐𝑜𝑢𝑛𝑡𝑚𝑎𝑙 ← 0
4: for each 𝑠𝑖𝑗 in  𝑖 do
5:  ← bytecode(𝑠𝑖𝑗 )
6:  ← opcode()
7:  ← encode()
8: label ← .classify( )
9: if label == 1 then
10: 𝑐𝑜𝑢𝑛𝑡𝑚𝑎𝑙 ← 𝑐𝑜𝑢𝑛𝑡𝑚𝑎𝑙 + 1
11: end if
12: S.put(𝑠𝑖𝑗 , label)
13: end for
14: U.put(𝑢𝑖 , 𝑐𝑜𝑢𝑛𝑡𝑚𝑎𝑙 )
15: end for
16: return U, S
17: end procedure

4. Proposed model simulations

This section shows the LSTM model (trained with Google BigQuery dataset) simulation results of testing the smart contracts. We
have examined two simulation scenarios for testing smart contracts. In the first scenario, we test whether smart contracts are attack
or bug-free, whereas other scenarios are considered vulnerable smart contracts. The proposed model accepts smart contract as an
input and displays the result as Safe or Vulnerable based on the calculated vulnerability probability as an output. Fig. 7 depicts
the execution scenario of smart contracts, which analyze them and convert them into bytecode then opcode. From the opcode, the
model calculates the vulnerability probability of a smart contract (considered 0.5 or 50% as a threshold value). If the vulnerability
probability is <0.5, then the smart contract is safe. Fig. 8 shows the simulation result of a malicious smart contract as vulnerable
as the probability of being vulnerable is 0.7988303.

5. Performance evaluation

This section describes the data collection, data preprocessing, and the performance of different models classifying a smart contract
as vulnerable or safe. The models have been compared based on their accuracy, precision, recall, and their respective receiver
operating characteristic curves (ROC) and the area under the curve (AUC). Further, the results regarding blockchain technology
regarding data storage cost and overall system scalability are also presented.

5.1. Dataset description

The smart contracts for analysis have been obtained from Google BigQuery [23]. It is a public dataset that is available for smart
contract analysis. The 7000 samples of smart contracts were selected randomly from the dataset for experimentation purposes. The
smart contracts were then processed to generate feature vectors, which will be fed into a model for training and classification.

8
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Fig. 6. The work flow of the proposed malicious smart contract detection scheme.

Fig. 7. LSTM model simulations resulted safe for the given smart contract.

9
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Algorithm 2 Smart Contract for Developer Fine Management


Input: ,  , 
Initialisation:
: unregistered entities
 : {𝑢1 , 𝑢2 , 𝑢3 , … 𝑢𝑛 } (Set of users)
 𝑖 : {𝑠𝑖1 , 𝑠𝑖2 , 𝑠𝑖3 , … 𝑠𝑖𝑚 } (Set of smart contracts from 𝑚𝑡ℎ user)
Output: 𝑢𝑠𝑡𝑎𝑡𝑢𝑠 𝑖 = {1 ∶ 𝑣𝑎𝑙𝑖𝑑, 0 ∶ 𝑖𝑛𝑣𝑎𝑙𝑖𝑑}
1: procedure Enrolling & Fine()
2: if ( ∈  ) then
3:  ←compile(.sol)
4:  ←CLASSIFY()
5: if  .𝑠𝑡𝑎𝑡𝑢𝑠 == 𝑣𝑎𝑙𝑖𝑑 then
6:  ←Deploy(.sol)
7: 𝑢𝑠𝑡𝑎𝑡𝑢𝑠
𝑖 ←1:valid
8: else
9:  𝑢 ←genFine(.sol,amt,𝑠𝑖 )
10: 𝑢𝑖 .𝑡𝑟𝑎𝑛𝑠𝑓 𝑒𝑟( 𝑢)
11: 𝑢𝑠𝑡𝑎𝑡𝑢𝑠
𝑖 ←0:invalid
12: end if
13: else
14:  ← Enroll(ID,𝑊𝑖 ,𝑇𝑝 )
15:  ←getCredentials()
16: end if
17: end procedure

Fig. 8. LSTM model simulations resulted vulnerable for the given smart contract.

5.2. Results and discussion

Fig. 10 shows the confusion matrix for classification by ANN, LSTM, and GRU. The confusion matrix shows the number of true
positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) as identified by the various models. The performance
of these models can be compared using accuracy, which can be calculated as follows.

𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (24)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Fig. 11a shows the comparison of accuracies obtained by respective models. ANN shows an accuracy of 0.98661, LSTM shows
0.99083, and GRU shows 0.98872. All models have obtained a high accuracy, with LSTM showing the highest accuracy of 99.083%.
However, the problem we have to take care of is that of an imbalanced dataset. One class is the majority in number while the other
is in the minority. A model can obtain high accuracy by labelling all the instances belonging to the majority class. Hence, accuracy
alone is not sufficient to compare the performance of all the models. To overcome this problem, the models are compared on
precision and recall.
Precision is the fraction of results that are relevant to the information. Precision can be calculated as shown in Eq. (25). From
Eq. (25), it is evident that high precision means a low number of false-positive results. The recall is the fraction of the total
relevant results that are correctly classified. Recall can be calculated as shown in Eq. (26), which suggests that lower the number
of false-negative higher is the recall value.
𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (25)
𝑇𝑃 + 𝐹𝑃

10
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Fig. 9. Comparative analysis of ROC curves vs.AUC.

Fig. 10. Comparative analysis of confusion matrices used for classification.

𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (26)
𝑇𝑃 + 𝐹𝑁
In this current application, identifying a safe smart contract as vulnerable, i.e., false positive, is less dangerous than identifying a
malicious or vulnerable smart contract as safe, i.e., false negative. Hence, the main focus should be on decreasing the false negative.
Therefore, the higher the recall better is the model for this application. Fig. 11c shows the comparison of recall for the used three
models. In our experimentation, the recall of all models turns out to be the same. Fig. 11b shows the precision obtained by the
models. LSTM shows the highest precision of 0.91935. Combining all these metrics, LSTM outperforms ANN and GRU in our
experiment. This can also be seen in the ROC given in Fig. 9. The larger the AUC, the better the classifier. The AUC is highest
for LSTM, which also shows that it is slightly better than others.

5.3. Blockchain-based analysis

5.3.1. Data storage cost


Fig. 12a shows the comparative analysis of transaction storage cost in blockchain with and without IPFS protocol in the proposed
system. As observed from Fig. 12a, we intercept that the data/transaction storage cost per word in an InterPlanetary file system
(IPFS)-based system is much improved over the non-IPFS based systems. The detailed reason behind such a huge difference in
storage costs (both IPFS and non-IPFS) is mathematically proved in Section 3.

11
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Fig. 11. Comparative analysis of different models based on accuracy, precision and recall.

Fig. 12. Blockchain-based results in terms of data storage cost and system scalability.

5.3.2. Scalability
Fig. 12b depicts the comparison of the scalability of the proposed system with or without IPFS protocol. The proposed system
stores its data in the distributed IPFS storage. Then the hash of stored data is calculated and forward into the blockchain. The size of
the calculated hash is very less compared to the size of the original transaction. Thus, the proposed system offers more transactions
to be included in the blockchain network at the same time. This serves more number of smart contracts in the blockchain network
that improves the overall system scalability.

6. Conclusion

This paper has introduced a deep learning-based scheme to assure blockchain data security by deploying safe or bug-free smart
contracts into the public blockchain network. The proposed scheme suggests which smart contract is safe and can be deployed
into the blockchain network with achievable vulnerability detection accuracy of 99.083%, the precision of 91.935%, and recall
of 87.692% and also penalize the users who tried to deploy the malicious smart contracts into the blockchain network. We also
simulated the LSTM model for smart contract classification and tested it for sample smart contract (.sol), which classified it as safe
with a probability of 0.00149%.
In the future, we would test the performance of the proposed scheme on a wide variety of models and fine-tuned it accordingly.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared
to influence the work reported in this paper.

12
R. Gupta et al. Computers and Electrical Engineering 97 (2022) 107583

Acknowledgment

This work is supported by Visvesvaraya Ph.D. Scheme for Electronics and IT by Department of Electronics and Information
Technology (DeiTY), Ministry of Communications and Information Technology, Government of India <MEITY-PHD-2828>.

References

[1] Contracts and agreements. 2020, https://www.smallbusiness.wa.gov.au/business-advice/legal-essentials/contracts-and-agreements. Accessed: 2020.


[2] Mehta D, Tanwar S, Bodkhe U, Shukla A, Kumar N. Blockchain-based royalty contract transactions scheme for Industry 4.0 supply-chain management. Inf
Process Manage 2021;58(4):1–17.
[3] Misra S, Saha N. Detour: Dynamic task offloading in software-defined fog for IoT applications. IEEE J Sel Areas Commun 2019;37(5):1159–66.
[4] Sayeed S, Marco-Gisbert H, Caira T. Smart contract: Attacks and protections. IEEE Access 2020;8:24416–27.
[5] Azzaoui AE, Singh SK, Pan Y, Park JH. Block5gintell: Blockchain for AI-enabled 5G networks. IEEE Access 2020;8:145918–35.
[6] Nikolić I, Kolluri A, Sergey I, Saxena P, Hobor A. Finding the greedy, prodigal, and suicidal contracts at scale. In: Proceedings of the 34th annual computer
security applications conference. ACSAC ’18, New York, NY, USA: Association for Computing Machinery; 2018, p. 653–63.
[7] Aujla GS, Jindal A. A decoupled blockchain approach for edge-envisioned IoT-based healthcare monitoring. IEEE J Sel Areas Commun 2021;39(2):491–9.
[8] Anand P, Singh Y, Selwal A, Singh PK, Felseghi RA, Raboaca MS. IoVT: Internet of vulnerable things? Threat architecture, attack surfaces, and vulnerabilities
in internet of things and its applications towards smart grids. Energies 2020;13(18):1–23.
[9] Liu C, Liu H, Cao Z, Chen Z, Chen B, Roscoe B. ReGuard: Finding reentrancy bugs in smart contracts. In: 2018 IEEE/ACM 40th international conference
on software engineering: Companion (ICSE-Companion). 2018, p. 65–8.
[10] Gupta R, Tanwar S, Al-Turjman F, Italiya P, Nauman A, Kim SW. Smart contract privacy protection using AI in cyber-physical systems: Tools, techniques
and challenges. IEEE Access 2020;8:24746–72.
[11] Jindal A, Aujla GS, Kumar N, Villari M. GUARDIAN: Blockchain-based secure demand response management in smart grid system. IEEE Trans Serv Comput
2020;13(4):613–24.
[12] Liu J, Liu Z. A survey on security verification of blockchain smart contracts. IEEE Access 2019;7:77894–904.
[13] di Angelo M, Salzer G. A survey of tools for analyzing ethereum smart contracts. In: 2019 IEEE international conference on decentralized applications and
infrastructures (DAPPCON). 2019, p. 69–78.
[14] Kongmanee J, Kijsanayothin P, Hewett R. Securing smart contracts in blockchain. In: 2019 34th IEEE/ACM international conference on automated software
engineering workshop (ASEW). 2019, p. 69–76.
[15] Shukla A, Bhattacharya P, Tanwar S, Kumar N, Guizani M. DwaRa: A deep learning-based dynamic toll pricing scheme for intelligent transportation
systems. IEEE Trans Veh Technol 2020;69(11):12510–20.
[16] Zhou E, Hua S, Pi B, Sun J, Nomura Y, Yamashita K, Kurihara H. Security assurance for smart contract. In: 2018 9th IFIP international conference on
new technologies, mobility and security (NTMS). 2018, p. 1–5.
[17] Tikhomirov S, Voskresenskaya E, Ivanitskiy I, Takhaviev R, Marchenko E, Alexandrov Y. SmartCheck: Static analysis of ethereum smart contracts. In: 2018
IEEE/ACM 1st international workshop on emerging trends in software engineering for blockchain (WETSEB). 2018, p. 9–16.
[18] Wang H, Li Y, Lin S, Ma L, Liu Y. VULTRON: Catching vulnerable smart contracts once and for all. In: 2019 IEEE/ACM 41st international conference on
software engineering: New ideas and emerging results (ICSE-NIER). 2019, p. 1–4.
[19] Tann WJ, Han XJ, Gupta SS, Ong Y. Towards safer smart contracts: A sequence learning approach to detecting vulnerabilities. 2018, CoRR, vol.
abs/1811.06632.
[20] Zheng Q, Li Y, Chen P, Dong X. An innovative IPFS-based storage model for blockchain. In: 2018 IEEE/WIC/ACM international conference on web
intelligence (WI). 2018, p. 704–8.
[21] G. Wood G. Ethereum: A secure decentralised generalised transaction ledger. In: Ethereum project yellow paper, Vol. 151. 2014, p. 1–32.
[22] Ethereum gas tracker.https://etherscan.io/gastracker. Accessed: 2021.
[23] Ethereum in BigQuery: a Public Dataset for smart contract analytics. 2020, https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-
public-dataset-smart-contract-analytics. Accessed: 2020.

Rajesh Gupta is a Full-Time Research Scholar in the Computer science and Engineering Department at Nirma University, India, under the supervision of Dr.
Sudeep Tanwar. He has authored/co-authored some publications in SCI Indexed Journals and IEEE ComSoc sponsored International Conferences. His research
interests include blockchain and D2D communication.

Mohil Maheshkumar Patel is currently pursuing a bachelor’s degree at Nirma University, Ahmedabad, Gujarat, India. His research interests are Machine Learning,
Deep Learning, and Natural Language Processing.

Arpit Shukla is currently pursuing a bachelor’s degree at Nirma University, Ahmedabad, Gujarat, India. His research interests are Machine Learning, Blockchain
Technology, and Network Security.

Sudeep Tanwar (M’15, SM’21) is working as a full Professor at Nirma University, India and was visiting professor at Jan Wyzykowski University in Polkowice,
Poland and the University of Pitesti in Pitesti, Romania. He received his Ph.D. in computer science and engineering from Mewar University, India. His research
interests include WSN, blockchain technology, fog computing, and smart grid. He has authored/co-authored more than 250 research papers in leading journals
and conferences of repute and has edited/authored more than 24 books published in leading publication houses like IET, Springer, Wiley, Taylors and Francis,
etc. He is also serving the editorial boards of COMCOM, IJCS, and SPY. He is also leading the ST Research Laboratory, where group members are working on
the latest cutting-edge technologies.

13

You might also like