Bindeep
Bindeep
A R T I C L E I N F O A B S T R A C T
Keywords: Binary code similarity detection (BCSD) plays an important role in malware analysis and vulnerability discovery.
Binary code Existing methods mainly rely on the expert’s knowledge for the BCSD, which may not be reliable in some cases.
Deep learning More importantly, the detection accuracy (or performance) of these methods are not so satisfied. To address
Similarity comparison
these issues, we propose BinDeep, a deep learning approach for binary code similarity detection. This method
Siamese neural network
LSTM
firstly extracts the instruction sequence from the binary function and then uses the instruction embedding model
CNN to vectorize the instruction features. Next, BinDeep applies a Recurrent Neural Network (RNN) deep learning
model to identify the specific types of two functions for later comparison. According to the type information,
BinDeep selects the corresponding deep learning model for similarity comparison. Specifically, BinDeep uses the
Siamese neural networks, which combine the LSTM and CNN to measure the similarities of two target functions.
Different from the traditional deep learning model, our hybrid model takes advantage of the CNN spatial
structure learning and the LSTM sequence learning. The evaluation shows that our approach can achieve good
BCSD between cross-architecture, cross-compiler, cross-optimization, and cross-version binary code.
* Corresponding author.
E-mail addresses: [email protected] (X. Jia), [email protected] (R. Ma), [email protected] (C. Hu).
https://doi.org/10.1016/j.eswa.2020.114348
Received 6 December 2019; Received in revised form 30 June 2020; Accepted 17 November 2020
Available online 3 December 2020
0957-4174/© 2020 Elsevier Ltd. All rights reserved.
D. Tian et al. Expert Systems With Applications 168 (2021) 114348
extract the instruction sequence as the features. Next, we utilize the traditional Siamese structure, we combine the CNN and LSTM models
classical natural language processing model to convert the instruction for the neural network construction. After all these networks are well
sequences into the vectors. Considering different comparison scenario, trained, they can convert the similar (or dissimilar) binary functions into
we use different siamese neural network for similarity measurement. For similar (or dissimilar) vectors. By computing the distance between two
this purpose, we apply a deep learning model to identify the specific binary functions, we can get their similarity value. If the value is smaller
types of two functions to be compared. Unlike the conventional Siamese than the predefined threshold, we think the two binary functions are
network structure, we make use of the hybrid network structure, which similar. Otherwise, they are dissimilar.
combines CNN and LSTM network to measure the binary function sim
ilarity. Since CNN can extract the local spatial features while LSTM is
2.3. Feature extraction and processing
capable of extracting sequential features automatically, the hybrid
neural network model could improve the similariy detection.
In previous approaches, most of them rely on the prior knowledge for
We have implemented our approach based on IDA Pro (Hex-Rays,
feature extraction. In our solution, we just utilize instruction sequence as
2018) and Keras (Keras Team, 2019). To evaluate the effectiveness of
the features. For this purchase, we exploit IDA Pro to dissemble the bi
our method, we prepare a custom dataset, which consists of more than
nary code and then get an instruction sequence for each function. For
47 million function pairs. The experiments show that our solution can
simplicity, the internal control flow of a function is not considered.
identify similar and dissimilar function pairs effectively on cross-
Similar to the recent studies (Massarelli, Di Luna, Petroni, Baldoni, &
architecture, cross-compiler, cross-optimization, and cross-version
Querzoni, 2019; Zuo et al., 2019), we make use of the NLP (Nature
binaries.
Language Processing) model to build our instruction embedding. In
In summary, we make the following contributions:
general, an instruction can be divided into two parts: one opcode and
one (or more) operand(s). The number of opcode type is limited while
• We propose a novel deep learning based solution for binary code
the representation of the operands changes a lot across different
similarity detection. This model utilizes the hybrid siamese neural
computing scenes. To address this problem, a straightforward method is
network to measure the binary code similarity.
to use only instruction opcodes as tokens for instruction embedding,
• We use the instruction embedding model to vectorize the extracted
omitting instruction operands. However, doing so will result in the in
instructions. To identify the types of functions to be compared, we
formation loss. In fact, instruction operands contain important semantic
apply a deep learning classification model.
information for similarity comparison.
• We conduct extensive experiments to evaluate our approach. The
To keep the operand information, the normalization processing is
experimental results show that BinDeep can achieve an average
need. Different from the recent method (Massarelli et al., 2019; Zuo
precision, recall, and F1 Score of 97.07%, 98.88%, and 97.97%
et al., 2019), we propose a simple but effective way to normalize the
respectively for BCSD.
instruction operands. Specifically, we classify the common operands
into 8 different categories: General Register, Direct Memory Reference,
2. Approach
Memory Ref [Base Reg + Index Reg], Memory Reg [Base Reg + Index
Reg + Displacement], Immediate Value, Immediate Far Address, Im
2.1. Problem statement
mediate Near Address and Other Type. Table 1 shows the examples of
instruction normalization for the x86 architecture.
The key task of our study is to judge whether the two functions I1 , I2
After the raw instructions are normalized, the next step is to perform
from different binary code are similar or not. If the two functions are the
the instruction embedding. There are two common methods for the
compiled results from the same original source code, they are similar.
embedding: one-hot encoding (Wikipedia, 2018) and word2vec (Gen
Otherwise, they are dissimilar. It is worth noting that the target binaries
sim, 2018). The one-hot encoding is very simple to represent an in
may be compiled using different compilers with different optimization
struction, but it cannot capture the relevance of two similar instructions.
levels, and they may also come from different CPU architectures and
On the contrary, the word2vec method is capable of converting similar
different program versions. Due to the complications arising from
instructions to similar vectors. For example, by using word2vec, the add
different compilation, it is a non-trivial task to achieve accurate simi
and sub instructions will be converted into the similar vectors. The
larity comparison for two binary functions.
word2vec contains two different models: CBOW (Continuous Bag of
Words) and skip-gram. Compared with the CBOW model, the skip-gram
2.2. The Framework of BinDeep
model can achieve better performance on a large dataset. Therefore, we
make use of the skip-gram model to build our instruction embedding.
As shown in Fig. 1, the framework of BinDeep can be divided into
The basic idea of the skip-gram model is to utilize the context in
three stages. In the first stage, we exploit IDA Pro to analyze the binary
formation (i.e., a sliding windows) to learn word embeddings on a text
code statically. After disassembling the instructions, we get an instruc
stream. For each word, the model will initially set a one-hot encoding
tion sequence for each function in the binary code. For the convenience
vector, and then it gets trained when going over each sliding window.
of later using the deep learning model, we leverage a NLP model to
The key point of the model is to figure out the probability P of an
perform the instruction embedding. By doing so, the instruction se
arbitrary word wk 1 in a sliding window Ct 2 given the embedding → w t 3 of
quences are converted into the vectors.
the current word wt . For this purpose, the softmax function is used as
In the second stage, we make use of a deep learning model to identify
the CPU architectures and optimization levels of target binary functions.
According to the identified function types, we will select a proper model 1
A sentence W consists of several words wk , and it can be represented as W =
for the binary code similarity detection. The main advantage of this
(w1 , …, wn ), wk ∈ R, 1⩽k⩽n,n refers to the number of words in a sentence.
stage is that we can improve the similarity detection with pertinence. 2
A sliding window C is a series of small part of a sentence W, and it can be
Thanks to this stage, different comparison scenarios will result in represented as
adopting different comparison models in the next stage. ( )
C = (c1 , …, cm ), ci = wj , …, wj+T− 1 , 1⩽i⩽m, 1⩽j⩽n − T +1, wj ∈ R, m refers to
In the third stage, we utilize the Siamese neural networks to detect the number of fixed windows, n refers to the number of words in a sentence W,
the binary code similarity. There are three Siamese network models, and and T refers to the size of a fixed window. In practice, for each word wj , we use
each one corresponds to a different comparison scenario. For simplicity, a floating-point number for representation, and each one has the same
all these three Siamese neural networks have the same structures, but accuracy.
their network parameters are not identical. Different from the 3 →
w t ∈ RL , Lrefers to the vector dimension.
2
D. Tian et al. Expert Systems With Applications 168 (2021) 114348
Fig. 1. The Similarity Detection Framework of BinDeep. The input to the framework is two binary functions. The output is the similarity value of these two functions.
3
D. Tian et al. Expert Systems With Applications 168 (2021) 114348
4
D. Tian et al. Expert Systems With Applications 168 (2021) 114348
tion, W2 and b2 are the weight and bias parameters of the convolutional networks are well trained within 50 epochs.
filters. A = (a1 , …, am ), ai ∈ RL , 1⩽i⩽m, L = 996, m = 300.The fourth
layer is a Max pooling layer. It is used to simplify the extracted features.
3.1. Dataset
The size of the max pooling windows is set to 2, and the stride is also set
to 2. For simplicity, the max pooling operation can be represented as the
To prepare the dataset, we select 6 popular Linux packages,
expression: Ã = Max(A). A ̃ = (ã1 , …, ã ̃i ∈ RL ,1⩽i⩽m,L = 498,m =
m ), a including coreutils, findutils, diffutils, sg3utils, and util-linux,. After
300.The final layer is a dense layer, which can help connecting the local getting the package source code, we use three CPU architectures (x86,
features. Its output can be represented as follows: x86-64 and ARM) and two compilers (gcc and clang) with four optimi
( ) zation levels (O0, O1, O2, and O3) to compile each program. For x86 and
O = σ W3 ⋅A ̃ + b3
x86-64 architectures, the compilers are allowed to use the extended
instruction set (e.g., MMX and SSE). If the two binary functions are
where ⋅ represents the dot product operation, W3 and b3 are the weight compiled from the same source code, they are matched. Otherwise, they
and bias parameters of this layer. O = (o1 , …, om ), oi ∈ R, 1⩽i⩽m, m = are not matched.
300. To facilitate the supervised learning, the proportions of positive and
The inputs to the Siamese network are two binary functions, namely negative samples (i.e., pairs of matched and unmatched functions) are
I1 and I2 . These two functions may be compiled from different CPU ar relatively balanced. To get the positive training samples, we use the
chitectures, compilers, optimization levels, and program versions. The unstripped information in the binary code to identify the matched
input length is set to 1000. If the function contains less than 1000 in functions in the different compiled files. To get the negative training
structions, we use the nop instructions as paddings. On the other hand, if samples, we randomly select functions in the different binary files with
the function contains more than 1000 instructions, we will truncate the the different function names. The labels for the positive samples and
tail instructions. The outputs of the embedding layers of the Siamese negative samples are ones and zeros.
network are the two embedding vectors, namely f(I1 , θ) and f(I2 , θ), In total, we obtain 4729140 samples. As shown in the Table 2, these
where f5 represents the hybrid network structure, and θ represents the samples can be divided into 5 categories: cross-compiler, cross-optimi
parameters of this network structure. We assume the embedding zation, cross-version, cross-architecture, and mixed function pairs. To
dimension is m. Additionally, there is an indicator input y to the Siamese evaluate the effectiveness of our method on unseen binary code, the
network, indicating whether the two input are similar or not. Precisely, whole dataset is split into three disjoint subsets for training, validation
if y is equal to 1, it indicates the two binary functions are similar, if y is 0, and testing. We set the proportion of these three subsets to 4:1:1. The
it indicates the two binary functions are dissimilar. debug symbol information are all stripped in these samples.
To define the loss function of the network, we leverage the
contrastive loss function (Hadsell, Chopra, & LeCun, 2006). The basic
idea of this loss function is to maximize the distance between two dis 3.2. Evaluation metrics
similar inputs, but to minimize the distance between two similar inputs.
For this purpose, the loss function is defined as follows: In order to evaluate the performance of our method, we make use of
the standard metrics: accuracy, precision, recall, F1, and TPR, which are
L(θ) = Average{y⋅D(I1 , I2 ) + (1 − y)⋅max(0, 1 − D(I1 , I2 ))} defined as follows:
( ) TP + TN
∑ (1)
m
Accuracy =
D I1 , I2 = |f(i1k , θk ) − f(i2k , θk )| TP + TN + FP + FN
k=1
TP
where D(I1 ,I2 ) denotes a Manhattan distance between two binary Precision = (2)
TP + FP
functions. Training the Siamese network is to find the parameters θ to
minimize the loss function. To this end, we make use of the Adam with Recall =
TP
(3)
standard back propagation algorithm. TP + FN
After the Siamese network is well trained, we can infer two state
Precision ∗ Recall
ments. When two binary functions are similar (i.e., y = 1), their Man F1 = 2 ∗ (4)
Precision + Recall
hattan distance should be close to zero so that loss value will be minimal.
When two binary functions are dissimilar (i.e., y = 0), their Manhattan FP
distance should be close to one6, and the loss function value will still be FPR = (5)
FP + TN
minimal.
In the above formulas, The True Positive (TP) represents the number
3. Evaluation of correctly identified matched function pairs. The False Positive (FP)
refers to the number of wrongly identified function pairs when the deep
Our experiments are carried out on a Dell T360 Server equipped with learning model identifies the unmatched function pairs as matched. The
two Intel Xeon E5-2603 V4 CPUs, 16 GB memory, 2 TB hard drives, and
one NVIDIA Tesla P100 12 GB GPU card. We implement a plug-in of the Table 2
tool IDA Pro 7.0 to extract an instruction sequence from each binary Description of sample types.
function. These network models are implemented in TensorFlow-1.8 Sample type Number Remarks
(Abadi et al., 2016) and Keras-2.2 (Keras Team, 2019). All these Cross-compiler 855136 Only the compilers are different
function pairs
Cross-optimization 855136 Only the optimization levels are different
→ function pairs
5
f : (I, θ)→ I is a parameterized function that takes a binary function I = (i1 , Cross-version function 501028 Only the function versions are different
→ pairs
…, i1000 ), ij ∈ R, 1⩽j⩽1000, as inputs, and outputs an embedding vector I =
(→ ) Cross-architecture 1282704 Only the CPU architectures are different
→ →
i 1 , …, i 300 , i j ∈ R, 1⩽j⩽300. function pairs
6 Mixed function pairs 1235136 The Compilers, optimization levels,versions,
The minimal margin distance between two dissimilar functions is set to one
and architectures may be all different
according to our experiments.
5
D. Tian et al. Expert Systems With Applications 168 (2021) 114348
True Negative (TN) represents the number of correctly identified un Table 4
matched function pairs. The False Negative (FN) refers to the number of Results under different embedding dimension in the LSTM model.
wrongly identified unmatched function pairs. Accuracy refers to the Embedding dimension Accuracy Precision Recall F1 Score FPR
percentage of function pairs that are identified correctly. Precision
100 0.9062 0.8964 0.8526 0.8739 0.0609
measures the percentage of matched function pairs that are correctly
200 0.9107 0.9021 0.8599 0.8804 0.0578
labeled. Recall represents the ability to identify matched function pairs 300 0.9248 0.9208 0.8776 0.8986 0.0466
correctly. FPR measures the percentage of unmatched function pairs that
are incorrectly labeled as matched ones. F1 score refers to the harmonic
mean of Precision and Recall.
Table 5
Results under different embedding dimension in the CLSTM model.
3.3. Effect of the feature processing
Embedding dimension Accuracy Precision Recall F1 Score FPR
In general, we use the instruction sequence as the features. An in 100 0.9820 0.9692 0.9844 0.9767 0.0195
struction consists of one opcode and one (or more) operand(s). Some 200 0.9812 0.9690 0.9825 0.9757 0.0196
methods (HaddadPajouh, Dehghantanha, Khayami, & Choo, 2018) use 300 0.9843 0.9707 0.9888 0.9797 0.0185
the opcode as the feature, omitting the operands, while our method uses
the whole instruction as the feature. To compare the effect of the conduct a set of tests. Table 6 and Table 7 show the comparison results of
different feature processing on the similarity identification, we conduct the LSTM and CLSTM neural network models when the classification
the corresponding experiments. As shown in the Table 3, our method is model is enabled/disabled. With help of the classification model, the
better than the approaches that only use the opcode as features in identification accuracy, precision, recall, and F1 score are all improved,
various evaluation metrics, including accuracy, precision, recall, F1 and the FPR is decreased in the LSTM and CLSTM models. In particular,
score, and FPR. The main reason is that the whole instruction contains the accuracy, precision, recall, and F1 score of the CLSTM model are
more information than the opcode. increased by 6.38%, 6.79%, 10.66% and 8.73%, the FPR of the CLSTM
model is decreased by 4.04%. The main reason for the effectiveness of
3.4. Effect of the embedding dimension adding the classification model is that we can utilize more targeted
neural network model for the similarity detection.
In this part, we explore the effect of the embedding dimension on the
identification results. For this purpose, we analyze the performance of
the LSTM and CLSTM neural network models for the mixed function 3.7. Effect of the neural network structure
pairs with different embedding dimension. Table 4 shows the identifi
cation metrics of the LSTM model. When the embedding dimension is To explore the effect of the network structure on the similarity
increased, the identification result will be better. For the CLSTM model measurement, we carry out a set of experiments in different scenarios.
shown in the Table 5, when the embedding dimension is changed from As mentioned previously, the dataset can be divided into 5 different
100 to 300, the identification result is relatively stable. These experi categories, which are corresponding to different comparison scenarios.
ments show the CLSTM model is more robust than the LSTM model on Regarding the performance on the mixed dataset, Table 8 shows the
the embedding dimension setting. evaluation results of the CNN, LSTM and CLSTM neural network models.
In these models, the embedding dimension is set to 300, and the number
3.5. Effect of the number of hidden unites of hidden unites is set to 18. From this table, we can see the CLSTM
model has obviously better performance than the CNN and LSTM
To examine whether the number of hidden unites affects the iden models. Table 9, Table 11, and Table 10 show the performance results on
tification results, we carry out the experiments under different number the cross-architecture, cross-compiler and the cross-optimization data
of hidden unites in the neural network from 2 to 20. Fig. 6a and 6b show sets respectively. Similarly, the CLSTM model has better performance in
the results on accuracy, recall, F1 score, and FPR when using the LSTM these experiments. For the performance on the cross-version dataset,
and CLSTM neural network models. In general, as the number of hidden Table 12 illustrates the evaluation result. In this evaluation, the dataset
unites increase, the accuracy and recall of these models are increased, consists of 6 versions of the GNU Core Utilities, including the latest
the FPR are decreased. In the LSTM model, the identification metrics are version 8.31. This evaluation also demonstrates the CLSTM model is
similar when setting the number of hidden unites to 16, 18, and 20. superior to the CNN and LSTM models.
Considering the more hidden unites will result in more computing cost,
we think setting the unite number to 16 is a good compromise between 4. Discussions
effectiveness and efficiency. In the CLSTM model, when the number of
hidden unites is 18, the various evaluation metrics are optimal. Similar to the previous studies (Liu et al., 2018; Massarelli et al.,
2019; Shalev & Partush, 2018; Xu et al., 2017; Zuo et al., 2019), our
3.6. Effect of adding the classification model method is limited to cope with the obfuscated binary code. Before
applying our approach, the deobfuscation procedure is needed to first
Previous methods only use a single neural network model to measure extract the internal logic from the obfuscated code. To this end, we could
the similarities of two functions. Different from these methods, we first leverage the recent deobfuscation techniques (Yadegari, Johannes
utilize the LSTM based classification model to identify the function types meyer, Whitely, & Debray, 2015; Xu, Ming, Fu, & Wu, 2018). We plan to
and then select the proper neural network model for the similarity explore the combination of our current method and the deobfuscation
measurement. To show the effeteness of the classification model, we technique as our future work.
To further improve the detection accuracy of our method, a potential
Table 3 solution is to use different neural network structures for different com
Results under different instruction features. parison scenarios. For example, we could apply the BiLSTM model for
BCSD across different architectures, and use the CLSTM model for BCSD
Instruction feature Accuracy Precision Recall F1 Score FPR
across different compilers. In addition, we may consider more compar
Opcode 0.9763 0.9654 0.9725 0.9689 0.0215 ison scenarios, which will correspond to more network models. To
Opcode + Operand 0.9851 0.9705 0.9913 0.9808 0.0187
measure the similarity distance, we could explore a different method.
6
D. Tian et al. Expert Systems With Applications 168 (2021) 114348
Fig. 6. Results under different hidden unites in the LSTM and CLSTM models.
Table 6 Table 11
Results of the LSTM and classification model + LSTM. Comparison results of LSTM, CNN and CLSTM models on the cross-compiler
Model Accuracy Precision Recall F1 Score FPR
dataset.
Model Accuracy Precision Recall F1 Score FPR
LSTM 0.8830 0.8788 0.8493 0.8637 0.0908
Classification 0.9263 0.9128 0.8746 0.8932 0.0515 LSTM 0.9069 0.8959 0.8798 0.8878 0.0733
model + LSTM CNN 0.9023 0.9031 0.8699 0.8861 0.0721
CLSTM 0.9727 0.9312 0.9956 0.9623 0.0296
Table 7 Table 10
Results of the CLSTM and classification model + CLSTM. Comparison results of LSTM, CNN and CLSTM models on the cross-optimization
Model Accuracy Precision Recall F1 Score FPR dataset.
CLSTM 0.9206 0.9023 0.8829 0.8924 0.0591 Model Accuracy Precision Recall F1 Score FPR
Classification 0.9844 0.9702 0.9895 0.9797 0.0187 LSTM 0.9304 0.9179 0.8812 0.8991 0.0459
model + CLSTM CNN 0.8998 0.9149 0.8679 0.8907 0.0718
CLSTM 0.9913 0.9860 0.9955 0.9769 0.0123
Table 8
(2016) utilize the numeric features to identify the potential candidate
Comparison results of LSTM, CNN and CLSTM models on the mixed dataset.
functions, and then exploit the structural features for similarity
Model Accuracy Precision Recall F1 Score FPR computation. Chandramohan et al. (2016) present a selective inlining
LSTM 0.9228 0.9110 0.8808 0.8956 0.0534 technique to capture function semantics and use this technique for bi
CNN 0.9029 0.8824 0.8699 0.8761 0.0900 nary search in different CPU architectures and operating systems. Shalev
CLSTM 0.9868 0.9700 0.9911 0.9804 0.0189
and Partush (2018) employ a machine learning method for BCSD. For
cross-architecture vulnerability search in binary firmware, Zhao et al.
(2019) propose a novel solution based on kNN-SVM and attributed
Table 9 control flow graph. Wang, Shen, Lin, and Lou (2019) develop a staged
Comparison results of LSTM, CNN and CLSTM models on the cross-architecture firmware function similarity analysis approach, which considers the
dataset. invocation relations as important features. Feng et al. (2016) present a
Model Accuracy Precision Recall F1 Score FPR Graph-based method for bug search across different architectures. By
LSTM 0.9204 0.9025 0.8871 0.8947 0.0619 converting the CFGs into the numeric feature vectors, this approach can
CNN 0.9093 0.8939 0.8649 0.8791 0.0897 achieve real-time search. To improve the detection performance, Xu
CLSTM 0.9800 0.9690 0.9859 0.9804 0.0245 et al. (2017) propose a novel neural network-based approach for BCSD.
Recently, Liu et al. (2018) make use of a deep neural network (DNN) to
cope with the challenges of BCSD. For the final similarity measurement,
For example, we may utilize Hamming distance of static binary features
this approach still relies on the manually selected inter-function fea
(Taheri et al., 2020) for BCSD. Considering the recent data poising at
tures. Massarelli et al. (2019) propose a solution to generate function
tacks on machine learning models, we may leverage the existing work
embeddings based on a self-attentive neural network. These embeddings
(Taheri, Javidan, Shojafar, Vinod, & Conti, 2020) to implement the
can easily used for computing binary similarity. Zuo et al. (2019) make
defense.
use of NLP techniques to resolve the code equivalence problem and code
containment problem. In industry, BinDiff (Zynamics, 2018) is a popular
5. Related work
tool to identify similar functions in different binaries. The main ad
vantages of static analysis methods are the efficiency, simplicity, and
Static Analysis. The basic idea of static analysis methods is to
scalability. Our method belongs to this category. Different from the
analyze the program code statically without executing it. David and
previous methods, our approach does not require the prior knowledge to
Yahav (2014) propose a tracelet matching method for computing simi
extract syntactic features for BCSD.
larity between functions. Eschweiler, Yakdan, and Gerhards-Padilla
Dynamic Analysis. Compared with the static analysis methods,
7
D. Tian et al. Expert Systems With Applications 168 (2021) 114348
8
D. Tian et al. Expert Systems With Applications 168 (2021) 114348
Wang, S., & Wu, D. (2017). In-memory fuzzing for binary code similarity analysis, in. In Yadegari, B., Johannesmeyer, B., Whitely, B., & Debray, S. (2015). A generic approach to
Proceedings of the 32Nd IEEE/ACM international conference on automated software automatic deobfuscation of executable code, in. In 2015 IEEE symposium on security
engineering (pp. 319–330). and privacy (pp. 674–691).
Wikipedia (2018). One-hot.https://en.wikipedia.org/wiki/One-hot. Zhao, D., Lin, H., Ran, L., Han, M., Tian, J., Lu, L., et al. (2019). Cvsksa: Cross-
Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., & Song, D. (2017). Neural network-based graph architecture vulnerability search in firmware based on knn-svm and attributed
embedding for cross-platform binary code similarity detection. In Proceedings of the control flow graph. Software Quality Journal.
2017 ACM SIGSAC conference on computer and communications security (pp. Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., & Zhang, Z. (2019). Neural machine
363–376). translation inspired binary code similarity comparison beyond function pairs. In
Xu, D., Ming, J., Fu, Y., & Wu, D. (2018). Vmhunt: A verifiable approach to partially- Proceedings of the 2019 network and distributed systems security symposium (NDSS).
virtualized binary code simplification, in. In Proceedings of the 2018 ACM SIGSAC Zynamics (2018). Bindiff. http://www.zynamics.com/bindiff.html.
conference on computer and communications security (pp. 442–458). ACM.