Call Graph Agnostic Malware Indexing
Joxean Koret
EuskalHack 2017
Introduction
ī€Š The Idea
ī€Š Previous Approaches
ī€Š The Tool
ī€Š IT'S A CONPIRACY!!!1!
ī€Š Future of the Tool
ī€Š Conclusions
The Idea
ī€Š Finding similarities between binary executable
malware samples by finding small and unique
(or almost unique) commonalities.
ī€Š Agnostic of the call graph: whole program
comparisons are avoided.
ī€Š Example: A set of very rare functions shared
between samples of malware campaign X and
Y of, supposedly, different actors.
ī€Š Could be useful, perhaps, for attribution.
Previous Approaches
ī€Š Antivirus: Create signatures of very specific
artifacts in malware samples.
ī€Š i.e.: Byte streams, specific strings, cryptographic
hashes, etc...
ī€Š Very-very false positive prone.
ī€Š Yes, I know. Antivirus products use a lot of
other mechanisms. But they aren't any good for
my purpose:
ī€Š Find unique commonalities.
Previous Approaches
ī€Š Cosa Nostra/VxClass:
ī€Š Whole program matching calculating differences
between malware samples.
ī€Š Output phylogenetic trees of, supposedly, malware
families.
ī€Š Uses the differences as indicators of distance from
the ā€œinitialā€ sample(s).
ī€Š Really-really slow.
ī€Š Also, whole program matching suffers from
many other problems (will be explained later).
Usual Problems
ī€Š AV approach: too error prone. A single string in a binary is
enough for an AV signature.
ī€Š Not useful to determine relationships.
ī€Š CosaNostra/VxClass: problems comparing programs with small
real code but big statically compiled libraries.
ī€Š i.e.: Two DLLs with one or more big static libraries, like
SQLite and OpenSSL, built in, where only the code at
DllMain changes.
ī€Š Obvious but non viable option to remove such false
positives: Build a lot of popular libraries with many compilers
and optimization levels/flags & create signatures to discard
them.
ī€Š Does not scale ā€œwellā€, at all, IMHO.
ī€Š Also, very slow.
ā€œOtherā€œ Approaches?
ī€Š Basically, what I want to do is program diffing,
but in a not so usual form.
ī€Š Let's explain what are, in my opinion, the most
popular program diffing usages...
ī€Š (With permission from a blonde guy over here...)
Program Diffing
ī€Š Program diffing is a method used to find
commonalities between 2 or more programs
using many means (like graph theory based
algorithms or matching constants, in example).
ī€Š Program/binary diffing, can be used for a wide
variety of different scenarios.
ī€Š I'll just discuss some of the most common ones.
Program Diffing
ī€Š Whole program matching: How much code
between 2 programs is different or common.
ī€Š New features detection between 2 versions of a
program, plagiarism detection, etc...
ī€Š Patch Diffing: Detect portions of code modified
between 2 different versions of the same
program.
ī€Š Find fixed vulnerabilities to write exploits or
detection methods.
ī€Š Or, for example, to create signatures to do
vulnerability extrapolation.
Program Diffing Tools
ī€Š For the 2 previous methods there are various tools like, for
example:
ī€Š BinDiff/Diaphora: General usage program diffing tools.
ī€Š Cosa Nostra/VxClass: Create phylogenetic trees of malware
families using as indicator of distance between leafs the
differences between the call and flow graphs.
ī€Š Such tools are useful to detect similarities and differences
between versions, fixed bugs, added functionalities, etc...
ī€Š But they aren't specifically created to find small unique
similarities between programs that could help in, say,
attribution.
ī€Š Attribution: which actor is behind some malware campaign?
Time to show the tool...
Enter... MalTindex!
Mal Tindex
ī€Š ā€œMal Tindexā€ is an Open Source Malware
Indexing set of tools.
ī€Š It aims to help malware researchers in
attributing a malware campaign by finding small
rare similarities between malware samples.
ī€Š E.g.: A function set that only exists in 2 binaries out
of a significantly big set.
ī€Š ā€œSignificantly bigā€ is very ā€œsignificantā€ here.
ī€Š The name for the project was proposed by a
friend as it tries to match ā€œcouplesā€ of things.
How it works?
ī€Š It first exports a set of signatures of each
function in a program using Diaphora.
ī€Š Then, the most significant and less false
positive prone signature types are used as
indicators and indexed.
ī€Š After a big enough number of program samples
is indexed, the signatures can be used to find
very specific functions that are only shared
between rather small sets of programs.
ī€Š ...
Signatures
ī€Š As explained before, MalTindex uses Diaphora
to export binaries to a database and generate
signatures.
ī€Š Diaphora exports almost everything from each
function, including pseudo-code, graph related
data, etc...
ī€Š The signature types (table fields generated by
Diaphora) that I found caused the less number
of false positives were these that I chose.
ī€Š They are explained in the next slides.
Signature Types
ī€Š Bytes hash: Just a MD5 over the whole function's bytes, for ā€œbig
enoughā€ functions.
ī€Š Pretty robust, almost 0 false positives found so far.
ī€Š Function hash: Similar as before, but removing the bytes that are
variable (i.e., non position independent parts).
ī€Š Same as before, with some false positives.
ī€Š MD Index: A hash function for CFGs based on the topological order,
in-degrees and out-degrees of a function. Invented by Thomas et al
(MP-IST-091-26).
ī€Š More ā€œfuzzyā€ than the others, thus, more false positive prone
during my testing. However, unique MD-Index values are pretty
useful. And, actually, this is what we're looking for here: unique
signatures.
ī€Š Pseudo-code primes: A small-primes-product (SPP) based on the
AST of the pseudo-code of a function (if a decompiler is available).
Matching
ī€Š MalTindex doesn't do anything ā€œmagicalā€ or very
special to match:
ī€Š Just compares equal but rare enough signatures.
ī€Š That's, basically, all it does.
ī€Š Every time a malware sample is exported with
Diaphora, a set of tables is updated with the set of rare
signatures.
ī€Š These tables with the rare signatures are the ones
used to find rare enough but common similarities.
ī€Š The actual malware indexes.
Using Mal Tindex
ī€Š How to use it:
$ export DIAPHORA_DB_CONFIG=/path/to/cfg
$ diaphora_index_batch.py /ida/dir/ samples_dir
ī€Š When all the samples are indexed:
$ maltindex.py <database path>
MalTindex> match MD5
or
MalTindex> match MD5_1 MD5_2
or
MalTindex> unique MD5
ī€Š And it will print all the matches or ā€œapparently uniqueā€ matches
between the whole dataset or for a specific sample.
DEMO
Time to discuss about more problems of this
approach.
And how it can lead to... bad attribution.
Attribution
ī€Š Attribution based on
ā€sharedā€ chunks of code
should be taken with big
care.
ī€Š Otherwise, it leads to,
basically, choosing your
own conspiracy.
ī€Š Specially for not big
enough datasets.
ī€Š Let's see some
examples...
Conspiranoia case #1:
The Sony Attack (2014)
Attribution
ī€Š During my tests with ~9,000 files, I found a curious
match:
ī€Š D1C27EE7CE18675974EDF42D4EEA25C6
ī€Š E4AD4DF4E41240587B4FE8BBCB32DB15
ī€Š The first sample was used in the Sony attack (wypall).
ī€Š The second sample is a ā€œDLLā€ from the NSA leak.
ī€Š Looks cool, right?
ī€Š It is a false positive: a specific version of zlib1.dll
appeared first, in my dataset, with one of the NSA
dumps.
The Match
The Sony Attack
ī€Š After I realized that it was a code chunk in zlib1.dll
(1.2.5) I asked my friends to send me more zlib1.dll
files.
ī€Š After +25 files indexed the match remained unique.
ī€Š After ~30 files indexed the unique match disappeared:
ī€Š A zlib1.dll in the popular Windows Xampp
application matched with the previous 2 samples.
ī€Š End of case. A false positive. My dataset was not big
enough.
ī€Š Solution: index more programs.
Conspiranoia #2:
The Lazarus Group & Amonetize case
The relationship is clear!!!1!
Lazarus & Amonetize
ī€Š Again, during my tests, I found yet another case
of a match that looked interesting (or at least
weird enough as to research it):
ī€Š 18A451D70F96A1335623B385F0993BCC
ī€Š CBDB7E158155F7FB73602D51D2E68438
ī€Š The first sample is Ratankba, used by Lazarus
group to attack a Polish bank.
ī€Š
The 2nd
sample is just an adware: Amonetize.
ī€Š What do they have in common?
ī€Š Let's see the shared code...
The Match
The Match (Fragments)
The Match
ī€Š Is the match good enough? Is the function big enough?
ī€Š I would say ā€œyesā€.
ī€Š Is the function unique across the whole dataset?
ī€Š It was. Until I added more samples to my dataset...
ī€Š 0925FB0F4C06F8B2DF86508745DBACB1 (Dalbot)
ī€Š 9144BE00F67F555B6B39F053912FDA37 (QQDownload,
not malware)
ī€Š Is this a false positive?
ī€Š Absolutely. This function seems to be from a specific
MFC/ATL version.
ī€Š End of case.
ī€Š Solution: Index more and more programs.
Conspiranoia #3:
Stuxnet and Duqu
Stuxnet & Duqu
ī€Š I was trying to find matches between the
dumped NSA tools and other things. I found
nothing too exciting.
ī€Š Then, I decided to try to match Duqus with
other things. And Stuxnet matched.
ī€Š 546C4BBEBF02A1604EB2CAAAD4974DE0
ī€Š A Driver.
ī€Š This is, probably, the first non false positive
result I have had. I think.
ī€Š Let's see...
Match
ī€Š Stuxnet:
ī€Š Duqu:
The first match
ī€Š The DriverReinitializationRoutine matches 1 to
1. However, it isn't big enough.
ī€Š But there are other matches.
ī€Š Let's see them...
Stuxnet
Stuxnet/Duqu
ī€Š The match is perfect. Indeed, even most
addresses in the driver actually match.
ī€Š This is not an isolated false positive and the
proof is conclusive.
ī€Š There is no single match, there are multiple
matches.
ī€Š F8153747BAE8B4AE48837EE17172151E
ī€Š C9A31EA148232B201FE7CB7DB5C75F5E
ī€Š And many others.
ī€Š End of history. Both Symantec and F-Secure
are right: Duqu is Stuxnet or shared code.
Conspiranoia #3:
Dark Seoul and Bifrose
Dark Seoul & Bifrose???
ī€Š When my dataset was not ā€œbig enoughā€, I found
a curious and apparently unique match:
ī€Š 5FCD6E1DACE6B0599429D913850F0364
ī€Š 0F23C9E6C8EC38F62616D39DE5B00FFB
ī€Š
The 1st
sample is a Dark Seoul (DPRK).
ī€Š
The 2nd
sample is a ghetto Bifrose.
ī€Š Do they have something in common???
ī€Š Actually, no. But the dataset was not big enough.
ī€Š Let's see the match...
The Match
Dark Seoul & Bifrose?
ī€Š The function is big enough and complex enough. So, the match
is good.
ī€Š But can we consider this proof conclusive?
ī€Š Not at all.
ī€Š Indeed, when I started feeding more samples to my dataset, it
wasn't any more a unique match.
ī€Š 46661C78C6AB6904396A4282BCD420AE (Nenim)
ī€Š 67A1DB64567111D5B02437FF2B98C0DE (Infected with a
version of Sality).
ī€Š There is no other match so... end of case. Is a false positive.
ī€Š I still don't know which function it is. But I know the solution:
ī€Š Index more and more programs.
Conspiranoia #4:
WannaCry & Lazarus Group
WannaCry & DPRK!!!!??
ī€Š
On 15th
May, Neel Mehta, a Google engineer
published the following tweet:
ī€Š This is a match (independently reproduced by
me based on their MD Index) between
WannaCry and a malware from the Lazarus
Group (DPRK).
ī€Š Let's see the match...
WannaCry & Lazarus Group
ī€Š MalTindex finds the same match between
samples:
ī€Š Searching for the specific MD Index, it only
finds the same 2 matches:
Lazarus Group (Fragment)
WannaCry & Lazarus Group
ī€Š The function is rare enough. In my dataset, there is only this
match.
ī€Š Apparently, Google only has this same match.
ī€Š However, for me, it's totally non conclusive.
ī€Š Can we associate an actor to a malware family just because
one function matches between them?
ī€Š Not enough evidence, in my opinion, and it can be done on
purpose. Actually, in the future, I will try to automate that
process.
ī€Š Also, the logic says that a group stealing millions of USD is not
an actor asking for a $300 ransom per box.
ī€Š End of case?
Conspiranoia #5:
Bundestrojaner, NSA, WannaCry and 2 shitty
malwares
LOL, WUT?
LOL, WUT?
ī€Š One of my favourite false positives ever. Searching for a
specific MD-Index (11.27212239987603972440105268) just 5
files appear:
ī€Š DB5EC5684A9FD63FCD2E62E570639D51: NSA's GROK
GkDecoder.
ī€Š 930712416770A8D5E6951F3E38548691: Bundestrojaner!
ī€Š 7257D3ADECEF5876361464088EF3E26B: Some Krap?
ī€Š 0EB2E1E1FAFEBF8839FB5E3E2AC2F7A8: Microsoft calls
it Nenim.
ī€Š DB349B97C37D22F5EA1D1841E3C89EB4: WannaCry!
ī€Š Naturally, it must be a false positive. Right?
Bundestrojan vs WannaCry
Bundestrojan vs WannaCry
Yet another false positive
ī€Š My guess, considering which malwares appear
to share this function is that it's a false positive.
ī€Š Ones seems to be an installer doing RAR stuff.
ī€Š Perhaps this function ā€œdecryptsā€ something?
ī€Š No idea, I haven't found what it actually does.
ī€Š This function is the only ā€œevidenceā€ that can be
used to relate such malwares.
ī€Š ...and groups
End of case or... it's a conspiracy!
ī€Š One shared function is not enough evidence.
ī€Š Unless you want to believe!
ī€Š Now you've ā€œproofā€ to relate WannaCry with the
Bundestag, the NSA, and various crappy malware
groups.
ī€Š If you love conspiranoias, it's better than chem
trails.
ī€Š End of case. Totally. For real.
Future
Future Plans
ī€Š I have a few ideas on mind for the not so far
future like:
ī€Š Make the dataset (the indexes) public. Probably
creating a web service to fetch/query/upload data.
ī€Š Create an IDA plugin to get symbol names from
indexed sources.
ī€Š Support both Radare2 and IDA as backends.
ī€Š Perhaps, support also DynInst.
ī€Š Implement a different and better way to determine
rareness. Probably based on Halvar's Bayes thing.
Web Service
ī€Š The idea is to, basically, allow others to:
ī€Š Index sample and upload its indexed data to a
public server.
ī€Š Allow others to query the samples I already indexed
or the ones people upload.
ī€Š Allow others to research/find match between
malware families, different actors/groups, etc...
IDA Plugin
ī€Š Like FLIRT signatures or Zynamics BinCrowd.
ī€Š The plugin would search for symbols for the
local database in a remote server, finding a
match with a big database with the most
common and non common stuff indexed.
ī€Š Open source libraries, well known malwares, etc...
ī€Š Every single component would be open
sourced:
ī€Š You can deploy your own symbols server in your
organization.
ī€Š Or you can use the public one I will publish some
day.
Exporters
ī€Š So far, Mal Tindex is based on Diaphora and it
just supports IDA.
ī€Š In The Not So Far Future (TM), I plan to add
support for exporting into Diaphora for the
following tools:
ī€Š Radare2.
ī€Š Maybe, DynInst.
Conclusions
Conclusions
ī€Š We can find seemingly unique similarities in
many cases when just looking to functions
between different groups/actors.
ī€Š Indeed, we can actually ā€œbuildā€ them.
ī€Š In my opinion, a single function matching in just
2 binaries is not enough evidence.
ī€Š During my testing, I was only able to prove a match
was not a false positive when there were various
unique functions matched, not just one.
ī€Š Regardless of its size.
Conclusions
ī€Š We should be skeptical when a company says that they have a
unique match between 2 groups based on just a single function.
ī€Š Why?
ī€Š We don't have their dataset, thus, we cannot independently
reproduce.
ī€Š We've to trust companies that might have some interest on
the specific case.
ī€Š On the other hand, if a company makes such a claim and
others are able to find another unique match, their credibility
would be deteriorated.
ī€Š E.g.: It's unlikely they would use such an argument unless
they believe on it.
Conclusions
ī€Š The size of the dataset is highly important when doing malware
indexing.
ī€Š The bigger it's, the better.
ī€Š However, considering how expensive it's in both terms of
storage and indexing time, it might be prohibitive for most
organizations.
ī€Š On average, a binary program takes 5 minutes to index (IDA
auto analysis + indexing tool execution).
ī€Š When my testing (SQLite) database was around 9,000 files,
it was ~10 GB. And it was too small.
ī€Š Do the math to calculate its requirements for a real, production,
use case.
Conclusions
ī€Š Having a really big dataset is a must to attribute/link groups
based on malware indexing. But, when is it big enough?
ī€Š I don't have an answer to it. Perhaps Google's dataset is big
enough. Or Microsoft's one. Or none.
ī€Š According to a friend, probably, a minimum of 1 million files.
ī€Š Can datasets be biased?
ī€Š Almost always: you will see only what you feed to your
system. You will only see your bubble.
ī€Š Are you filtering samples? Your dataset is biased.
ī€Š Can we reproduce what $company says?
ī€Š Hardly, unless they share their whole dataset.
ī€Š Can $company use attribution politically?
ī€Š Lol.
Call Graph Agnostic Malware Indexing (EuskalHack 2017)

Call Graph Agnostic Malware Indexing (EuskalHack 2017)

  • 1.
    Call Graph AgnosticMalware Indexing Joxean Koret EuskalHack 2017
  • 2.
    Introduction ī€Š The Idea ī€ŠPrevious Approaches ī€Š The Tool ī€Š IT'S A CONPIRACY!!!1! ī€Š Future of the Tool ī€Š Conclusions
  • 3.
    The Idea ī€Š Findingsimilarities between binary executable malware samples by finding small and unique (or almost unique) commonalities. ī€Š Agnostic of the call graph: whole program comparisons are avoided. ī€Š Example: A set of very rare functions shared between samples of malware campaign X and Y of, supposedly, different actors. ī€Š Could be useful, perhaps, for attribution.
  • 4.
    Previous Approaches ī€Š Antivirus:Create signatures of very specific artifacts in malware samples. ī€Š i.e.: Byte streams, specific strings, cryptographic hashes, etc... ī€Š Very-very false positive prone. ī€Š Yes, I know. Antivirus products use a lot of other mechanisms. But they aren't any good for my purpose: ī€Š Find unique commonalities.
  • 5.
    Previous Approaches ī€Š CosaNostra/VxClass: ī€Š Whole program matching calculating differences between malware samples. ī€Š Output phylogenetic trees of, supposedly, malware families. ī€Š Uses the differences as indicators of distance from the ā€œinitialā€ sample(s). ī€Š Really-really slow. ī€Š Also, whole program matching suffers from many other problems (will be explained later).
  • 6.
    Usual Problems ī€Š AVapproach: too error prone. A single string in a binary is enough for an AV signature. ī€Š Not useful to determine relationships. ī€Š CosaNostra/VxClass: problems comparing programs with small real code but big statically compiled libraries. ī€Š i.e.: Two DLLs with one or more big static libraries, like SQLite and OpenSSL, built in, where only the code at DllMain changes. ī€Š Obvious but non viable option to remove such false positives: Build a lot of popular libraries with many compilers and optimization levels/flags & create signatures to discard them. ī€Š Does not scale ā€œwellā€, at all, IMHO. ī€Š Also, very slow.
  • 7.
    ā€œOtherā€œ Approaches? ī€Š Basically,what I want to do is program diffing, but in a not so usual form. ī€Š Let's explain what are, in my opinion, the most popular program diffing usages... ī€Š (With permission from a blonde guy over here...)
  • 8.
    Program Diffing ī€Š Programdiffing is a method used to find commonalities between 2 or more programs using many means (like graph theory based algorithms or matching constants, in example). ī€Š Program/binary diffing, can be used for a wide variety of different scenarios. ī€Š I'll just discuss some of the most common ones.
  • 9.
    Program Diffing ī€Š Wholeprogram matching: How much code between 2 programs is different or common. ī€Š New features detection between 2 versions of a program, plagiarism detection, etc... ī€Š Patch Diffing: Detect portions of code modified between 2 different versions of the same program. ī€Š Find fixed vulnerabilities to write exploits or detection methods. ī€Š Or, for example, to create signatures to do vulnerability extrapolation.
  • 10.
    Program Diffing Tools ī€ŠFor the 2 previous methods there are various tools like, for example: ī€Š BinDiff/Diaphora: General usage program diffing tools. ī€Š Cosa Nostra/VxClass: Create phylogenetic trees of malware families using as indicator of distance between leafs the differences between the call and flow graphs. ī€Š Such tools are useful to detect similarities and differences between versions, fixed bugs, added functionalities, etc... ī€Š But they aren't specifically created to find small unique similarities between programs that could help in, say, attribution. ī€Š Attribution: which actor is behind some malware campaign?
  • 11.
    Time to showthe tool...
  • 12.
  • 13.
    Mal Tindex ī€Š ā€œMalTindexā€ is an Open Source Malware Indexing set of tools. ī€Š It aims to help malware researchers in attributing a malware campaign by finding small rare similarities between malware samples. ī€Š E.g.: A function set that only exists in 2 binaries out of a significantly big set. ī€Š ā€œSignificantly bigā€ is very ā€œsignificantā€ here. ī€Š The name for the project was proposed by a friend as it tries to match ā€œcouplesā€ of things.
  • 14.
    How it works? ī€ŠIt first exports a set of signatures of each function in a program using Diaphora. ī€Š Then, the most significant and less false positive prone signature types are used as indicators and indexed. ī€Š After a big enough number of program samples is indexed, the signatures can be used to find very specific functions that are only shared between rather small sets of programs. ī€Š ...
  • 15.
    Signatures ī€Š As explainedbefore, MalTindex uses Diaphora to export binaries to a database and generate signatures. ī€Š Diaphora exports almost everything from each function, including pseudo-code, graph related data, etc... ī€Š The signature types (table fields generated by Diaphora) that I found caused the less number of false positives were these that I chose. ī€Š They are explained in the next slides.
  • 16.
    Signature Types ī€Š Byteshash: Just a MD5 over the whole function's bytes, for ā€œbig enoughā€ functions. ī€Š Pretty robust, almost 0 false positives found so far. ī€Š Function hash: Similar as before, but removing the bytes that are variable (i.e., non position independent parts). ī€Š Same as before, with some false positives. ī€Š MD Index: A hash function for CFGs based on the topological order, in-degrees and out-degrees of a function. Invented by Thomas et al (MP-IST-091-26). ī€Š More ā€œfuzzyā€ than the others, thus, more false positive prone during my testing. However, unique MD-Index values are pretty useful. And, actually, this is what we're looking for here: unique signatures. ī€Š Pseudo-code primes: A small-primes-product (SPP) based on the AST of the pseudo-code of a function (if a decompiler is available).
  • 17.
    Matching ī€Š MalTindex doesn'tdo anything ā€œmagicalā€ or very special to match: ī€Š Just compares equal but rare enough signatures. ī€Š That's, basically, all it does. ī€Š Every time a malware sample is exported with Diaphora, a set of tables is updated with the set of rare signatures. ī€Š These tables with the rare signatures are the ones used to find rare enough but common similarities. ī€Š The actual malware indexes.
  • 18.
    Using Mal Tindex ī€ŠHow to use it: $ export DIAPHORA_DB_CONFIG=/path/to/cfg $ diaphora_index_batch.py /ida/dir/ samples_dir ī€Š When all the samples are indexed: $ maltindex.py <database path> MalTindex> match MD5 or MalTindex> match MD5_1 MD5_2 or MalTindex> unique MD5 ī€Š And it will print all the matches or ā€œapparently uniqueā€ matches between the whole dataset or for a specific sample.
  • 19.
  • 20.
    Time to discussabout more problems of this approach. And how it can lead to... bad attribution.
  • 21.
    Attribution ī€Š Attribution basedon ā€sharedā€ chunks of code should be taken with big care. ī€Š Otherwise, it leads to, basically, choosing your own conspiracy. ī€Š Specially for not big enough datasets. ī€Š Let's see some examples...
  • 22.
    Conspiranoia case #1: TheSony Attack (2014)
  • 23.
    Attribution ī€Š During mytests with ~9,000 files, I found a curious match: ī€Š D1C27EE7CE18675974EDF42D4EEA25C6 ī€Š E4AD4DF4E41240587B4FE8BBCB32DB15 ī€Š The first sample was used in the Sony attack (wypall). ī€Š The second sample is a ā€œDLLā€ from the NSA leak. ī€Š Looks cool, right? ī€Š It is a false positive: a specific version of zlib1.dll appeared first, in my dataset, with one of the NSA dumps.
  • 24.
  • 25.
    The Sony Attack ī€ŠAfter I realized that it was a code chunk in zlib1.dll (1.2.5) I asked my friends to send me more zlib1.dll files. ī€Š After +25 files indexed the match remained unique. ī€Š After ~30 files indexed the unique match disappeared: ī€Š A zlib1.dll in the popular Windows Xampp application matched with the previous 2 samples. ī€Š End of case. A false positive. My dataset was not big enough. ī€Š Solution: index more programs.
  • 26.
    Conspiranoia #2: The LazarusGroup & Amonetize case
  • 27.
  • 28.
    Lazarus & Amonetize ī€ŠAgain, during my tests, I found yet another case of a match that looked interesting (or at least weird enough as to research it): ī€Š 18A451D70F96A1335623B385F0993BCC ī€Š CBDB7E158155F7FB73602D51D2E68438 ī€Š The first sample is Ratankba, used by Lazarus group to attack a Polish bank. ī€Š The 2nd sample is just an adware: Amonetize. ī€Š What do they have in common? ī€Š Let's see the shared code...
  • 29.
  • 30.
  • 31.
    The Match ī€Š Isthe match good enough? Is the function big enough? ī€Š I would say ā€œyesā€. ī€Š Is the function unique across the whole dataset? ī€Š It was. Until I added more samples to my dataset... ī€Š 0925FB0F4C06F8B2DF86508745DBACB1 (Dalbot) ī€Š 9144BE00F67F555B6B39F053912FDA37 (QQDownload, not malware) ī€Š Is this a false positive? ī€Š Absolutely. This function seems to be from a specific MFC/ATL version. ī€Š End of case. ī€Š Solution: Index more and more programs.
  • 32.
  • 33.
    Stuxnet & Duqu ī€ŠI was trying to find matches between the dumped NSA tools and other things. I found nothing too exciting. ī€Š Then, I decided to try to match Duqus with other things. And Stuxnet matched. ī€Š 546C4BBEBF02A1604EB2CAAAD4974DE0 ī€Š A Driver. ī€Š This is, probably, the first non false positive result I have had. I think. ī€Š Let's see...
  • 34.
  • 35.
    The first match ī€ŠThe DriverReinitializationRoutine matches 1 to 1. However, it isn't big enough. ī€Š But there are other matches. ī€Š Let's see them...
  • 36.
  • 37.
    Stuxnet/Duqu ī€Š The matchis perfect. Indeed, even most addresses in the driver actually match. ī€Š This is not an isolated false positive and the proof is conclusive. ī€Š There is no single match, there are multiple matches. ī€Š F8153747BAE8B4AE48837EE17172151E ī€Š C9A31EA148232B201FE7CB7DB5C75F5E ī€Š And many others. ī€Š End of history. Both Symantec and F-Secure are right: Duqu is Stuxnet or shared code.
  • 38.
  • 39.
    Dark Seoul &Bifrose??? ī€Š When my dataset was not ā€œbig enoughā€, I found a curious and apparently unique match: ī€Š 5FCD6E1DACE6B0599429D913850F0364 ī€Š 0F23C9E6C8EC38F62616D39DE5B00FFB ī€Š The 1st sample is a Dark Seoul (DPRK). ī€Š The 2nd sample is a ghetto Bifrose. ī€Š Do they have something in common??? ī€Š Actually, no. But the dataset was not big enough. ī€Š Let's see the match...
  • 40.
  • 41.
    Dark Seoul &Bifrose? ī€Š The function is big enough and complex enough. So, the match is good. ī€Š But can we consider this proof conclusive? ī€Š Not at all. ī€Š Indeed, when I started feeding more samples to my dataset, it wasn't any more a unique match. ī€Š 46661C78C6AB6904396A4282BCD420AE (Nenim) ī€Š 67A1DB64567111D5B02437FF2B98C0DE (Infected with a version of Sality). ī€Š There is no other match so... end of case. Is a false positive. ī€Š I still don't know which function it is. But I know the solution: ī€Š Index more and more programs.
  • 42.
  • 43.
    WannaCry & DPRK!!!!?? ī€Š On15th May, Neel Mehta, a Google engineer published the following tweet: ī€Š This is a match (independently reproduced by me based on their MD Index) between WannaCry and a malware from the Lazarus Group (DPRK). ī€Š Let's see the match...
  • 44.
    WannaCry & LazarusGroup ī€Š MalTindex finds the same match between samples: ī€Š Searching for the specific MD Index, it only finds the same 2 matches:
  • 45.
  • 46.
    WannaCry & LazarusGroup ī€Š The function is rare enough. In my dataset, there is only this match. ī€Š Apparently, Google only has this same match. ī€Š However, for me, it's totally non conclusive. ī€Š Can we associate an actor to a malware family just because one function matches between them? ī€Š Not enough evidence, in my opinion, and it can be done on purpose. Actually, in the future, I will try to automate that process. ī€Š Also, the logic says that a group stealing millions of USD is not an actor asking for a $300 ransom per box. ī€Š End of case?
  • 47.
    Conspiranoia #5: Bundestrojaner, NSA,WannaCry and 2 shitty malwares
  • 48.
  • 49.
    LOL, WUT? ī€Š Oneof my favourite false positives ever. Searching for a specific MD-Index (11.27212239987603972440105268) just 5 files appear: ī€Š DB5EC5684A9FD63FCD2E62E570639D51: NSA's GROK GkDecoder. ī€Š 930712416770A8D5E6951F3E38548691: Bundestrojaner! ī€Š 7257D3ADECEF5876361464088EF3E26B: Some Krap? ī€Š 0EB2E1E1FAFEBF8839FB5E3E2AC2F7A8: Microsoft calls it Nenim. ī€Š DB349B97C37D22F5EA1D1841E3C89EB4: WannaCry! ī€Š Naturally, it must be a false positive. Right?
  • 50.
  • 51.
  • 52.
    Yet another falsepositive ī€Š My guess, considering which malwares appear to share this function is that it's a false positive. ī€Š Ones seems to be an installer doing RAR stuff. ī€Š Perhaps this function ā€œdecryptsā€ something? ī€Š No idea, I haven't found what it actually does. ī€Š This function is the only ā€œevidenceā€ that can be used to relate such malwares. ī€Š ...and groups
  • 53.
    End of caseor... it's a conspiracy! ī€Š One shared function is not enough evidence. ī€Š Unless you want to believe! ī€Š Now you've ā€œproofā€ to relate WannaCry with the Bundestag, the NSA, and various crappy malware groups. ī€Š If you love conspiranoias, it's better than chem trails. ī€Š End of case. Totally. For real.
  • 54.
  • 55.
    Future Plans ī€Š Ihave a few ideas on mind for the not so far future like: ī€Š Make the dataset (the indexes) public. Probably creating a web service to fetch/query/upload data. ī€Š Create an IDA plugin to get symbol names from indexed sources. ī€Š Support both Radare2 and IDA as backends. ī€Š Perhaps, support also DynInst. ī€Š Implement a different and better way to determine rareness. Probably based on Halvar's Bayes thing.
  • 56.
    Web Service ī€Š Theidea is to, basically, allow others to: ī€Š Index sample and upload its indexed data to a public server. ī€Š Allow others to query the samples I already indexed or the ones people upload. ī€Š Allow others to research/find match between malware families, different actors/groups, etc...
  • 57.
    IDA Plugin ī€Š LikeFLIRT signatures or Zynamics BinCrowd. ī€Š The plugin would search for symbols for the local database in a remote server, finding a match with a big database with the most common and non common stuff indexed. ī€Š Open source libraries, well known malwares, etc... ī€Š Every single component would be open sourced: ī€Š You can deploy your own symbols server in your organization. ī€Š Or you can use the public one I will publish some day.
  • 58.
    Exporters ī€Š So far,Mal Tindex is based on Diaphora and it just supports IDA. ī€Š In The Not So Far Future (TM), I plan to add support for exporting into Diaphora for the following tools: ī€Š Radare2. ī€Š Maybe, DynInst.
  • 59.
  • 60.
    Conclusions ī€Š We canfind seemingly unique similarities in many cases when just looking to functions between different groups/actors. ī€Š Indeed, we can actually ā€œbuildā€ them. ī€Š In my opinion, a single function matching in just 2 binaries is not enough evidence. ī€Š During my testing, I was only able to prove a match was not a false positive when there were various unique functions matched, not just one. ī€Š Regardless of its size.
  • 61.
    Conclusions ī€Š We shouldbe skeptical when a company says that they have a unique match between 2 groups based on just a single function. ī€Š Why? ī€Š We don't have their dataset, thus, we cannot independently reproduce. ī€Š We've to trust companies that might have some interest on the specific case. ī€Š On the other hand, if a company makes such a claim and others are able to find another unique match, their credibility would be deteriorated. ī€Š E.g.: It's unlikely they would use such an argument unless they believe on it.
  • 62.
    Conclusions ī€Š The sizeof the dataset is highly important when doing malware indexing. ī€Š The bigger it's, the better. ī€Š However, considering how expensive it's in both terms of storage and indexing time, it might be prohibitive for most organizations. ī€Š On average, a binary program takes 5 minutes to index (IDA auto analysis + indexing tool execution). ī€Š When my testing (SQLite) database was around 9,000 files, it was ~10 GB. And it was too small. ī€Š Do the math to calculate its requirements for a real, production, use case.
  • 63.
    Conclusions ī€Š Having areally big dataset is a must to attribute/link groups based on malware indexing. But, when is it big enough? ī€Š I don't have an answer to it. Perhaps Google's dataset is big enough. Or Microsoft's one. Or none. ī€Š According to a friend, probably, a minimum of 1 million files. ī€Š Can datasets be biased? ī€Š Almost always: you will see only what you feed to your system. You will only see your bubble. ī€Š Are you filtering samples? Your dataset is biased. ī€Š Can we reproduce what $company says? ī€Š Hardly, unless they share their whole dataset. ī€Š Can $company use attribution politically? ī€Š Lol.