Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018
Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018
Detection
Cosolan Cornelia Ionela
May 22, 2018
Abstract
The paper focuses on the analysis of the different methods of de-
tecting software harmful, identified by the name of malware. The
methods studied are in the category of intelligence artificial.
Introduction
Malware is an abstract term that describes a wide range of malware
programs. It includes trojans, worms, rootkits, ransomware, cyber threats,
and even unwanted potential programs (PUPs). Malware is usually installed
in the system without the knowledge or approval of the user, exploiting
security vulnerabilities. Only updated anti-malware programs are able to
prevent infiltration. Security experts urge people to consider installing a
reputable application to protect their computers and avoid malware attacks.
Malicious software is generally used to initiate unauthorized activities on
the computer and help the owner generate revenue. It can be designed to
steal personal information, such as login and bank data, or try to encrypt
important computer files and make their owner pay a reward in exchange for
the decryption key. Regardless, some versions of malware (adware, browser
hijackers, and the like) are only used to display promotional content on peo-
ple’s computers and generate pay-per-click revenue. Almost every malware
threat has the ability to block legitimate security software. In addition, they
can update themselves, download additional malware or cause holes in the
affected PC system security.
Depending on the actions for which they were designed, malware is clas-
sified into several types:
• viruses - a computer program that has the ability to multiply / copy
into other files or computers;
1
• computer virus - computer program that can be transmitted to other
computers that are connected to the computer with the infected com-
puter;
2
Detection methods
All malware detection techniques can be divided into signature-based and
behavior-based methods. Before going into these methods, it is essential
to understand the basics of two malware analysis approaches: static and
dynamic malware analysis. As it implies from the name, static analysis is
performed “statically”, i.e. without execution of the file. In contrast, dy-
namic analysis is conducted on the file while it is being executed for example
in the virtual machine.
Static analysis can be viewed as “reading” the source code of the malware
and trying to infer the behavioral properties of the file. Static analysis can
include various techniques:
Static analysis often relies on certain tools. Beyond the simple analysis,
they can provide information on protection techniques used by malware.
The main advantage of static analysis is the ability to discover all possible
behavioral scenarios.
Another analysis type is dynamic analysis. Unlike static analysis, here
the behavior of the file is monitored while it is executing and the properties
and intentions of the file are inferred from that information. Usually, the
file is run in the virtual environment, for example in the sandbox. During
3
this kind of analysis, it is possible to find all behavioral attributes, such as
opened files, created mutexes, etc.
Thus, new detection techniques inspect the behavior of the software in-
stead, where an algorithm is used to learn the patterns of malware activities.
Usually, this is achieved by using supervised machine learning, where the
malware detection system, called the classifier, is trained using already iden-
tified malware samples.
When using automatic learning there is a series of steps that are standard-
ized, indifferent of the field in which it is applied. A first step is represented
by the extraction of features in the field analyzed. This preprocessing stage
is crtic in terms of the results obtained, by the way with features that clearly
highlight the differences between the different classes for which classification
is made, even and with a trivial classification algorithm can be obtained very
good results. And if the malware detection problem is he needs the pre-
processing stage, being a necessity to be able to use the algorithms further
classification.
1) Data Preprocessing: A First Step to preprocessing consists in ob-
taining the data. because the domain is not characterized by a set of data
that to be accessible and easy to obtain, must be established a methodology
for obtaining them.
2) Classification: Having this fixed stage the distance between two in-
stances, the data is organized in space as dense clouds as the processes that
are behave the same they are close to one another, choosing for representa-
tion a small level. For classification both unsupervised learning and learning
are used supervised. The first category is used clustering, and classification
with labels at mood is an intermediate stage.
3) Incremental analysis: An important feature of the proposed solution
is represented by the module incremental work. Reports are made every day
and new data goes into the system, improving the solution so continuously.
Conclusions
The problem of security in computer networks and distributed systems
is one of the most targeted topics, but not only. Since the emergence of the
first systems has tried to find some the most effective methods to detect, and
especially to prevent malicious programs found in malware literature to cause
damage to systems. Moreover, the IoT industry has met in the last period
an expansion. Threats to this level are also an important subject of discus-
sion. Connecting devices together can to be dangerous, when not systematic
security issues. For example a refrigerator that is capable of doing scan a
4
product inside it and do it the online order for the missing products seems
to be the interesting idea. However, in 2014 a the refrigerator connected to
the internet was found responsible to send 750,000 spam messages. No one
was thinking of installing a program malware detection on a seemingly frigid
was not a point of interest for the attacks software. But the problems may
be worse in this domain, compromising important information of the users
who may be in the connection devices, such as personal identification data
or bank accounts.
References
[1] K. Rieck, P. Trinius, C. Willems, and T. Holz, “Automatic analysis of
malware behavior using machine learning,” Journal of Computer Security,
2010.