0% found this document useful (0 votes)
132 views5 pages

Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018

This document discusses artificial intelligence methods for detecting malware. It describes how malware detection techniques have evolved from manual analysis by specialists to now utilizing automated artificial intelligence approaches like supervised and unsupervised machine learning. These AI methods are used to detect malware based on analyzing behaviors through techniques like static analysis of file contents and dynamic analysis of behaviors while executing in a virtual environment. The document outlines the standard steps for AI-based malware detection including data preprocessing, feature extraction, classification using clustering or labeled learning, and continuous incremental analysis as new data is added.

Uploaded by

Ionela Cosolan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views5 pages

Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018

This document discusses artificial intelligence methods for detecting malware. It describes how malware detection techniques have evolved from manual analysis by specialists to now utilizing automated artificial intelligence approaches like supervised and unsupervised machine learning. These AI methods are used to detect malware based on analyzing behaviors through techniques like static analysis of file contents and dynamic analysis of behaviors while executing in a virtual environment. The document outlines the standard steps for AI-based malware detection including data preprocessing, feature extraction, classification using clustering or labeled learning, and continuous incremental analysis as new data is added.

Uploaded by

Ionela Cosolan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Artificial Intelligence in Malware

Detection
Cosolan Cornelia Ionela
May 22, 2018

Abstract
The paper focuses on the analysis of the different methods of de-
tecting software harmful, identified by the name of malware. The
methods studied are in the category of intelligence artificial.

Introduction
Malware is an abstract term that describes a wide range of malware
programs. It includes trojans, worms, rootkits, ransomware, cyber threats,
and even unwanted potential programs (PUPs). Malware is usually installed
in the system without the knowledge or approval of the user, exploiting
security vulnerabilities. Only updated anti-malware programs are able to
prevent infiltration. Security experts urge people to consider installing a
reputable application to protect their computers and avoid malware attacks.
Malicious software is generally used to initiate unauthorized activities on
the computer and help the owner generate revenue. It can be designed to
steal personal information, such as login and bank data, or try to encrypt
important computer files and make their owner pay a reward in exchange for
the decryption key. Regardless, some versions of malware (adware, browser
hijackers, and the like) are only used to display promotional content on peo-
ple’s computers and generate pay-per-click revenue. Almost every malware
threat has the ability to block legitimate security software. In addition, they
can update themselves, download additional malware or cause holes in the
affected PC system security.
Depending on the actions for which they were designed, malware is clas-
sified into several types:
• viruses - a computer program that has the ability to multiply / copy
into other files or computers;

1
• computer virus - computer program that can be transmitted to other
computers that are connected to the computer with the infected com-
puter;

• spyware - a program that spies the computer user in order to obtain


personal data such as passwords, identification data, bank data, etc.

• adware - a program that displays or downloads ads without the user’s


consent, or directs online traffic to sites that display ads in an abusive
way;

• Trojan horse - is a program that is supposed to be a useful applica-


tion, but besides the application itself it steals information or executes
actions without the user’s knowledge.

Solutions to detect malware are diverse. If at first they were based on


program behavior analysis by specialists and then building programs that
will stop the malicious actions, now automation is attempted of this process.
Thus, solutions have emerged in the field of artificial intelligence automatic
malware detection, solutions that are based on both supervised learning as
well as unsupervised learning. There are also implementations that are ca-
pable of making decisions following detection, such as killing a process which
is detected to be malicious, as is the solution that uses intelligent agents.
There are two general ways of detection. These are anomaly-based and
signature-based. The first of these methods is based on the use of knowledge
about normal behavior to make the decision if the behavior of a software is
abnormal. These categories of methods have usually two phases: learning
and accumulation phase knowledge of normal program behavior and the be-
havioral detection phase abnormally using the ones learned in the previous
phase. Disadvantages of the solutions that are part of it category are the
high rate of false alarms given by the algorithm, caused by the fact that new
software, which no longer seen by the detector are classified as malicious,
even if not the choice traits used for learning.
The second category of methods, signature-based, uses the characteri-
zation of what is known to be abnormal or malicious to decide whether a
software is part of that category. Even if the last disadvantage listed in the
category described above up is preserved, ie the choice of features used for
classification is a critical and difficult stage eliminates the first disadvantage
of having a rate large false alarms.

2
Detection methods
All malware detection techniques can be divided into signature-based and
behavior-based methods. Before going into these methods, it is essential
to understand the basics of two malware analysis approaches: static and
dynamic malware analysis. As it implies from the name, static analysis is
performed “statically”, i.e. without execution of the file. In contrast, dy-
namic analysis is conducted on the file while it is being executed for example
in the virtual machine.
Static analysis can be viewed as “reading” the source code of the malware
and trying to infer the behavioral properties of the file. Static analysis can
include various techniques:

• File Format Inspection: file metadata can provide useful informa-


tion. For example, Windows PE (portable executable) files can provide
much information on compile time, imported and exported functions,
etc.

• String Extraction: this refers to the examination of the software


output (e.g. status or error messages) and inferring information about
the malware operation.

• Fingerprinting: this includes cryptographic hash computation, find-


ing the environmental artifacts, such as hardcoded username, filename,
registry strings.

• AV scanning: if the inspected file is a well-known malware, most


likely all anti-virus scanners will be able to detect it. Although it might
seem irrelevant, this way of detection is often used by AV vendors or
sandboxes to “confirm” their results.

• Disassembly: this refers to reversing the machine code to assembly


language and inferring the software logic and intentions. This is the
most common and reliable method of static analysis.

Static analysis often relies on certain tools. Beyond the simple analysis,
they can provide information on protection techniques used by malware.
The main advantage of static analysis is the ability to discover all possible
behavioral scenarios.
Another analysis type is dynamic analysis. Unlike static analysis, here
the behavior of the file is monitored while it is executing and the properties
and intentions of the file are inferred from that information. Usually, the
file is run in the virtual environment, for example in the sandbox. During

3
this kind of analysis, it is possible to find all behavioral attributes, such as
opened files, created mutexes, etc.
Thus, new detection techniques inspect the behavior of the software in-
stead, where an algorithm is used to learn the patterns of malware activities.
Usually, this is achieved by using supervised machine learning, where the
malware detection system, called the classifier, is trained using already iden-
tified malware samples.
When using automatic learning there is a series of steps that are standard-
ized, indifferent of the field in which it is applied. A first step is represented
by the extraction of features in the field analyzed. This preprocessing stage
is crtic in terms of the results obtained, by the way with features that clearly
highlight the differences between the different classes for which classification
is made, even and with a trivial classification algorithm can be obtained very
good results. And if the malware detection problem is he needs the pre-
processing stage, being a necessity to be able to use the algorithms further
classification.
1) Data Preprocessing: A First Step to preprocessing consists in ob-
taining the data. because the domain is not characterized by a set of data
that to be accessible and easy to obtain, must be established a methodology
for obtaining them.
2) Classification: Having this fixed stage the distance between two in-
stances, the data is organized in space as dense clouds as the processes that
are behave the same they are close to one another, choosing for representa-
tion a small level. For classification both unsupervised learning and learning
are used supervised. The first category is used clustering, and classification
with labels at mood is an intermediate stage.
3) Incremental analysis: An important feature of the proposed solution
is represented by the module incremental work. Reports are made every day
and new data goes into the system, improving the solution so continuously.

Conclusions
The problem of security in computer networks and distributed systems
is one of the most targeted topics, but not only. Since the emergence of the
first systems has tried to find some the most effective methods to detect, and
especially to prevent malicious programs found in malware literature to cause
damage to systems. Moreover, the IoT industry has met in the last period
an expansion. Threats to this level are also an important subject of discus-
sion. Connecting devices together can to be dangerous, when not systematic
security issues. For example a refrigerator that is capable of doing scan a

4
product inside it and do it the online order for the missing products seems
to be the interesting idea. However, in 2014 a the refrigerator connected to
the internet was found responsible to send 750,000 spam messages. No one
was thinking of installing a program malware detection on a seemingly frigid
was not a point of interest for the attacks software. But the problems may
be worse in this domain, compromising important information of the users
who may be in the connection devices, such as personal identification data
or bank accounts.

References
[1] K. Rieck, P. Trinius, C. Willems, and T. Holz, “Automatic analysis of
malware behavior using machine learning,” Journal of Computer Security,
2010.

[2] N. Idika and A. Mathur, “A survey of malware detection techniques,”


Departament of Computer Science, Purdue University, West Lafayette.

[3] “Journal of computer security.” [Online]. Available: http: //www.raid-


symposium.org/

[4] “Journal of computer security.” [Online]. Available: https:


//www.iospress.nl/journal/journal-of-computer-security/

You might also like