The Background
Noise of the
Internet
Andrew Morris
@Andrew___Morris
• Thank you
• Founders
• Committee
• Staff
• Attendees
About Me
Andrew Morris
Background in offensive
cyber stuff, security
research
Previously:
* Endgame R&D
* Intrepidus (NCC
Group)
* KCG (ManTech)
Twitter: @Andrew___Morris
Lots of people scan the Internet.
I built a system that collects all of the
Internet-wide scan traffic.
I analyze the data to find weird stuff.
I make that data available to researchers
for free via an API
Structure
• Background
• Previous Work
• Architecture
• Analysis
• Roadmap
• Conclusion
• Questions
Background
• Internet-wide mass scanning is easier than ever
• Open source tooling: Masscan, ZMap,
UnicornScan, etc
• Cloud computing
• Instant servers
• large amount of recyclable IP addresses
• High throughput / faster global Internet
connections
What is Internet Mass
Scanning?
• “Mass Scanning” is scanning every single
routable IP address on the Internet for
something
• The IPv4 address space is 0.0.0.0 –
255.255.255.255
• Give or take a few blocks
• That’s 4.2 billion IP addresses
• Bandwidth-wise, roughly same as uploading a 240
GB file
What does this mean?
• Lots and lots of people scanning the Internet,
for lots of different things
• From millions of different IP addresses
• Benign: Shodan, Censys, Sonar, ShadowServer
• Malicious: SSH/Telnet worms (Mirai), IOT worms,
CONFICKER, etc
• Internet-wide scanning is busier than ever
This creates a problem
When you see an IP scanning your network,
are they scanning you specifically or the
entire Internet?
When you see an IP attacking your network,
are they attacking you specifically or the
entire Internet?
Solution
• Collect all the omnidirectional Internet-wide
IPv4 scan/attack traffic
• Subtract those IPs/activity from your SIEM
• All the remaining activity is targeting you
But how?
• Stand up a large amount of servers in diverse
data centers with no business value
• No business value means that ANY traffic that hits
it is, by definition, opportunistic
• Instrument these servers with extremely
aggressive logging and small microservices
• Stream the logs of the scan/attack traffic to a
central place
• Analyze the data and convert into a consumable
format
Barriers
• It is strategically cheaper to ask a question
of the Internet than it is to answer a given
question
• How many computers are running X version of software
is easy
• How many computers are scanning for X version of
software is hard
Byproducts
• Observe changes in Internet-scanning over time
• Opt-out of omnidirectional scanning altogether
• Collect information on malware campaigns and
botnets
History
• Like three honeypots (2014)
• Animus v1 (2015)
• Bash and glue (SHMOOCON 2015 “No Budget Threat
Intelligence”)
• Related work at a previous company (2015-2016)
• EPIPHANY (2016)
• THE DATA THAT HONEYPOTS COLLECT IS SHITTY THREAT
INTELLIGENCE
• IT’S LITERALLY THE OPPOSITE OF THREAT INTELLIGENCE
• IT’S ANTI THREAT INTELLIGENCE
• Animus GOES COMMERCIAL (2017)
• Turns out startups are hard
• Grey Noise (2018)
• I’m not going to stop until I die
• ???
• Become a monk
Which leads me to…
GreyNoise
• Read about it here: https://greynoise.io
• API docs here: https://github.com/grey-noise-
intelligence/api.greynoise.io
• Visualizer here: http://viz.greynoise.io
ARCHITECTURE
Architecture
• Collection
• Orchestration
• Data Producers / Services
• Log Forwarder
• Message Bus
• Streamd
• Analysis
• Cache / Database
• Enrichments
• Analyticsd
• Consumption
• API
• Front End
• Operational Security
AWS
DigitalOcean
Azure
RabbitMQ
Long Term Storage
Database CacheAnalytics Server
Analytics Database
Web API
Step one: Stand up lots of
servers
in different regions of different
cloud providers
AWS
DigitalOcean
Azure
RabbitMQ
Long Term Storage
Database CacheAnalytics Server
Analytics Database
Web API
Collection: Orchestration
• Terraform
• Open source tool on GitHub by Hashicorp
• Supports lots of different cloud providers
• AWS
• DigitalOcean
• Azure
• Google Cloud
• Etc
Collection: Orchestration
(Lessons)
• LESSON: Cloud-init
• LESSON: NAT or nah
• LESSON: Interface names
• Eth0
• Ath1
• whatever
Collection: Data producers /
Services
• Ridiculously aggressive iptables rules
• Log all packets
• …on all ports
• …on all protocols
• SSH
• Telnet
• HTTP
• Others
Collection: Data producers /
Services
(Lessons)
• MISTAKE: Tune your iptables / p0f / sniffers / whatever to
ignore garbage / outbound traffic
• LESSON: Things will be spoofed (TCP, UDP, and ICMP)
• LESSON: Bang for your buck: Iptables, HTTP, Telnet, SSH, and
P0f
Step two: Stream the data to a
central place
AWS
DigitalOcean
Azure
RabbitMQ
Long Term Storage
Database CacheAnalytics Server
Analytics Database
Web API
Collection: Message Bus
•RabbitMQ
•Message Queue
•Topic routing
Collection: Message Bus
(Lessons)
• MISTAKE: Google PubSub
• LESSON: Maintain state
• LESSON: Meta message envelop
• Time
• Provider
• Region
• Node UUID
• POSSIBLE: ZeroMQ, Kafka
•Streamd
Collection: Log Forwarder
•I wrote my own
•Python + Pygtail / iNotify / Watchdog
•Can also use something that’s already been
written
•Logstash
•Elasticsearch Filebeat
•Rsyslog
AWS
DigitalOcean
Azure
RabbitMQ
Long Term Storage
Database CacheAnalytics Server
Analytics Database
Web API
Step three: Put the data in a
database
Analysis: Cache / Database
• PostgreSQL
• N days of data, rotates
• Fast-ish
• Robust
Dumpster
Long term storage
You’re going to fuck something up
Retro load is your friend
Analysis: Cache / Database
• MISTAKE: Postgres is awesome but too slow for data this big
• MISTAKE: Google BigQuery is the shit but it gets expensive if you're
doing batch queries on a very short timeline
• LESSON: Postgres + Cassandra is the truth
AWS
DigitalOcean
Azure
RabbitMQ
Long Term Storage
Database CacheAnalytics Server
Analytics Database
Web API
Step four: Enrich the IP data
Analysis: Enrichments
• We need:
• ASN
• rDNS
• Organization
• Country
• City
• Maxmind is expensive
• Neustar is expensive
• Ipinfo is CHEAP
• Harvesting it yourself is also CHEAP but requires a lot of effort
Analysis: Enrichments
(Lessons)
• MISTAKE: Collecting the data yourself is hard and inconsistent and
involves a lot of work
• LESSON: ARIN has an unauthenticated non rate-limited public API for
IP ownership
• LESSON: Enrichd
• LESSON: Cache rules everything around me
Analysis: Enrichments
Step five: Analyze and
categorize/tag the data
AWS
DigitalOcean
Azure
RabbitMQ
Long Term Storage
Database CacheAnalytics Server
Analytics Database
Web API
Analysis: Analyticsd
• Service to analyze some time window of data
• E.g. past 4 days of data
• Catalogue:
• Actors
• Shodan
• Censys
• Sonar
• Activity
• Scanning for SSH
• Scanning for Telnet
• LESSON: YOU PROBABLY DON’T NEED REAL TIME ANALYTICS
• Batch analytics with small time frames
• This is why Postgres will often do the trick
• LESSON: Only pay attention to activity that has happened on more than one of your nodes
• LESSON: You need to know how many nodes are up collecting data at any point in time to
properly do a time-series analysis
AWS
DigitalOcean
Azure
RabbitMQ
Long Term Storage
Database CacheAnalytics Server
Analytics Database
Web API
Step six: Make the data
available
Consumption: API
• Web API
• Tell me about this IP address
• Tell me about this analytic
• Github
• Search “Grey Noise API”
• Github.com/Grey-Noise-
Intelligence
Consumption: Bindings
• Bobby Filar: phyler/greynoise
• Tek: PyGreyNoise
• Bob Rudis: R bindings
• Some mystery Go bindings out there
Consumption: FRONT END
• Complete 100% credit to Casey Buto (github.com/cbuto)
• Point and click interface
• Hosted version at viz.greynoise.io
• EXPLORE THE DATA
http://viz.greynoise.i
o
Consumption: FRONT END
• Complete 100% credit to Casey Buto (github.com/cbuto)
• Point and click interface
• Hosted version at viz.greynoise.io
• EXPLORE THE DATA
OpSec (Operational Security)
• Hard to fingerprint (mostly custom services)
• Encrypt everything
• No names
• Ops domains
• Dockerize
• Shift infrastructure constantly
• Reduce the oracle surface
• IO is hard to opsec
• Minimum number / node thresholds
• Sleep delays
Cost
• AWS: 15 regions
• $4.75 per box
• Total: $71
• Digitalocean: 11 regions
• $5 per box
• Total: $55
• Google: 36 regions
• $4.28 per box
• Total: $154
• Total: $400 per month
Vultr: 15 regions
$5 per box (they advertise $2.50 but they're never
available)
Total: $75
Linode: 9 regions
$5 per box
Total: $45
Cost (notes)
• Notes:
• No Ops boxes in here (you need these)
• This is simply not enough to have complete coverage but it'll give you a good
start
• You can save money by buying extra IPs, but it complicates engineering
ANALYSIS
Analysis
• What am I collecting?
• Volume Summary
• Data Summary
• Actor Summary
• Benign
• Malicious
• Unknown???
• Malware Summary
• Hall of Shame (Malware-iest
regions of the Internet)
• WEIRD SHIT
• Misc Lessons
What am I collecting?
• Passive
• Iptables – Packets on ports
• P0f – passive OS fingerprint
• JA3 – SSL fingerprint (stick around!)
• Active
• HTTP
• SSH
• Telnet
• Experimental
• RDP
• SIP
• SMTP
• NTP
• TFTP
• DNS
Data Summary
• Iptables:
• I don’t have a good way to quantify this yet
• HTTP:
• Lots of ”/”, spoofed user agents, search engines, people looking for
Jboss/Wordpress/Tomcat/PHPMyAdmin
• SSH + Telnet
• Bots. Defaults cred attempts. Nothing new here.
• P0f
• Lots of OS visibility
Volume Summary
• With the aforementioned numbers ($400 worth of servers):
• 1M – 2M iptables events per day
• 700k – 1M SSH logins per day
• 1M – 10M telnet logins per day
• 10K – 100K HTTP requests per day
• 100-200 messages per second through your queue
• ~60K IPs per day
• 1GB of raw data, msgpacked + compressed per day
Actor Summary
• Benign:
• Shodan: 27 IPs
• Censys: 334 IPs
• Sonar: 56 IPs
• ShadowServer: 228 IPs
• IPIP: 63 IPs
• BinaryEdge: 253 IPs
• PDRLabs: 25 IPs
• Pingdom: 9 IPs
• ProbeTheNet: 1 IP
• NetCraft: 145 IPs
• Others
• Malicious
• Mirai: 249k IPs
• SSH Worms: 92k IPs
• Popped Routers / residential IPs attacking
people: 590k IPs
Hall of Shame
Hall of Shame (cloud)
WEIRD SHIT
Pretenders
• Machines advertising client banners that are
false
• Mismatches between user agent, p0f OS fingerprint,
and JA3
• Is the browser hitting this HTTP server really
running Safari on a Linux kernel 3.1 box? Is it?
• Why? Idk
Dangling DNS
• When you spin a bunch of IPs up and down, it’s
not uncommon to inherit an IP address from your
cloud provider that still has a domain pointing
to it.
• CDN.whatever123.acme.com
• This traffic is dirty, you don’t want it
“WORM FINDER”
• Sometimes when Grey Noise observes an IP
address scanning for a given TCP port, I’ll
turn around and check to see if that port is
open on the source machine.
• If the answer is yes, this can be a great
indicator of a worm
• Why else would a computer search for behavior
that it also exhibits?
• Average lifespan from start to finish is 4 days
Zmap’s hardcoded ID parameter
• Zmap hardcodes all packets it creates with an
ID parameter of “54321”, making it trivial to
fingerprint
• Go to “github.com/zmap/zmap” and search / grep
the repository for “54321”
• Shoutout Oliver Gasser @ Technical University
of Munich
Still SO MANY WINDOWS WORMS
• LOADS of people blasting SMB traffic on TCP
port 445
• More and more RDP worms as well, but these
aren’t exploiting vulns, just guessing creds
• WinRM is next, in my opinion
People do weird stuff through
proxies
• Airline price scraping data (???)
• Also testing stolen credentials
• And probably credit card numbers
• News sites??? This is a huge rabbit hole…
Lots of robo calls probably
come from popped SIP boxes
• People try to make calls to India and Russia
through open VOIP servers
• Like, LOTS of them
• Tens of thousands per day
The things people
scan for through
Tor is
interesting
You can neuter/blow up worms by
replaying their own traffic back
to them
• A box is compromised with a Telnet worm
• The worm carries a built in wordlist
• The compromised box throws the same wordlist at
you
• You replay the wordlist back to the compromised
box
• Chances are, depending on the worm, one of
those credentials will work
ROADMAP
What does the future hold?
• Version 1.1 API coming very soon
• Integrate with everything
• Badass machine learning opportunities
• Explore identifying anti-threat intelligence in
other areas
• Intranet traffic
• DMZ traffic
• Files on a filesystem
CONCLUSION
Conclusion
• The Internet is a noisy place
• Every packet has a story
• It’s possible to collect all of this background
noise
• If you want to explore the data, hit the API.
If the API doesn’t give you what you need,
email me or hit me up on Twitter
Acknowledgements
• Phil Maddox (twitter.com/foospidy)
• Bobby Filar (twitter.com/filar)
• Rich Seymour (twitter.com/rseymour)
• Casey Buto (github.com/cbuto)
• Bob Rudis (twitter.com/hrbrmstr)
• Tek (twitter.com/tenacioustek)
• Mickey Perre (twitter.com/MickeyPerre)
• Michel Oosterhof (twitter.com/micheloosterhof)
QUESTIONS?
THANK YOU!
andrew@morris.sc
@andrew___morris

The Background Noise of the Internet

  • 1.
    The Background Noise ofthe Internet Andrew Morris @Andrew___Morris
  • 2.
    • Thank you •Founders • Committee • Staff • Attendees
  • 3.
    About Me Andrew Morris Backgroundin offensive cyber stuff, security research Previously: * Endgame R&D * Intrepidus (NCC Group) * KCG (ManTech) Twitter: @Andrew___Morris
  • 4.
    Lots of peoplescan the Internet. I built a system that collects all of the Internet-wide scan traffic. I analyze the data to find weird stuff. I make that data available to researchers for free via an API
  • 5.
    Structure • Background • PreviousWork • Architecture • Analysis • Roadmap • Conclusion • Questions
  • 6.
    Background • Internet-wide massscanning is easier than ever • Open source tooling: Masscan, ZMap, UnicornScan, etc • Cloud computing • Instant servers • large amount of recyclable IP addresses • High throughput / faster global Internet connections
  • 7.
    What is InternetMass Scanning? • “Mass Scanning” is scanning every single routable IP address on the Internet for something • The IPv4 address space is 0.0.0.0 – 255.255.255.255 • Give or take a few blocks • That’s 4.2 billion IP addresses • Bandwidth-wise, roughly same as uploading a 240 GB file
  • 8.
    What does thismean? • Lots and lots of people scanning the Internet, for lots of different things • From millions of different IP addresses • Benign: Shodan, Censys, Sonar, ShadowServer • Malicious: SSH/Telnet worms (Mirai), IOT worms, CONFICKER, etc • Internet-wide scanning is busier than ever
  • 9.
    This creates aproblem When you see an IP scanning your network, are they scanning you specifically or the entire Internet? When you see an IP attacking your network, are they attacking you specifically or the entire Internet?
  • 10.
    Solution • Collect allthe omnidirectional Internet-wide IPv4 scan/attack traffic • Subtract those IPs/activity from your SIEM • All the remaining activity is targeting you
  • 11.
    But how? • Standup a large amount of servers in diverse data centers with no business value • No business value means that ANY traffic that hits it is, by definition, opportunistic • Instrument these servers with extremely aggressive logging and small microservices • Stream the logs of the scan/attack traffic to a central place • Analyze the data and convert into a consumable format
  • 12.
    Barriers • It isstrategically cheaper to ask a question of the Internet than it is to answer a given question • How many computers are running X version of software is easy • How many computers are scanning for X version of software is hard
  • 13.
    Byproducts • Observe changesin Internet-scanning over time • Opt-out of omnidirectional scanning altogether • Collect information on malware campaigns and botnets
  • 14.
    History • Like threehoneypots (2014) • Animus v1 (2015) • Bash and glue (SHMOOCON 2015 “No Budget Threat Intelligence”) • Related work at a previous company (2015-2016) • EPIPHANY (2016) • THE DATA THAT HONEYPOTS COLLECT IS SHITTY THREAT INTELLIGENCE • IT’S LITERALLY THE OPPOSITE OF THREAT INTELLIGENCE • IT’S ANTI THREAT INTELLIGENCE • Animus GOES COMMERCIAL (2017) • Turns out startups are hard • Grey Noise (2018) • I’m not going to stop until I die • ??? • Become a monk
  • 15.
  • 17.
    GreyNoise • Read aboutit here: https://greynoise.io • API docs here: https://github.com/grey-noise- intelligence/api.greynoise.io • Visualizer here: http://viz.greynoise.io
  • 18.
  • 19.
    Architecture • Collection • Orchestration •Data Producers / Services • Log Forwarder • Message Bus • Streamd • Analysis • Cache / Database • Enrichments • Analyticsd • Consumption • API • Front End • Operational Security
  • 20.
    AWS DigitalOcean Azure RabbitMQ Long Term Storage DatabaseCacheAnalytics Server Analytics Database Web API
  • 21.
    Step one: Standup lots of servers in different regions of different cloud providers
  • 22.
    AWS DigitalOcean Azure RabbitMQ Long Term Storage DatabaseCacheAnalytics Server Analytics Database Web API
  • 23.
    Collection: Orchestration • Terraform •Open source tool on GitHub by Hashicorp • Supports lots of different cloud providers • AWS • DigitalOcean • Azure • Google Cloud • Etc
  • 24.
    Collection: Orchestration (Lessons) • LESSON:Cloud-init • LESSON: NAT or nah • LESSON: Interface names • Eth0 • Ath1 • whatever
  • 25.
    Collection: Data producers/ Services • Ridiculously aggressive iptables rules • Log all packets • …on all ports • …on all protocols • SSH • Telnet • HTTP • Others
  • 26.
    Collection: Data producers/ Services (Lessons) • MISTAKE: Tune your iptables / p0f / sniffers / whatever to ignore garbage / outbound traffic • LESSON: Things will be spoofed (TCP, UDP, and ICMP) • LESSON: Bang for your buck: Iptables, HTTP, Telnet, SSH, and P0f
  • 27.
    Step two: Streamthe data to a central place
  • 28.
    AWS DigitalOcean Azure RabbitMQ Long Term Storage DatabaseCacheAnalytics Server Analytics Database Web API
  • 29.
  • 31.
    Collection: Message Bus (Lessons) •MISTAKE: Google PubSub • LESSON: Maintain state • LESSON: Meta message envelop • Time • Provider • Region • Node UUID • POSSIBLE: ZeroMQ, Kafka •Streamd
  • 32.
    Collection: Log Forwarder •Iwrote my own •Python + Pygtail / iNotify / Watchdog •Can also use something that’s already been written •Logstash •Elasticsearch Filebeat •Rsyslog
  • 33.
    AWS DigitalOcean Azure RabbitMQ Long Term Storage DatabaseCacheAnalytics Server Analytics Database Web API
  • 34.
    Step three: Putthe data in a database
  • 35.
    Analysis: Cache /Database • PostgreSQL • N days of data, rotates • Fast-ish • Robust Dumpster Long term storage You’re going to fuck something up Retro load is your friend
  • 36.
    Analysis: Cache /Database • MISTAKE: Postgres is awesome but too slow for data this big • MISTAKE: Google BigQuery is the shit but it gets expensive if you're doing batch queries on a very short timeline • LESSON: Postgres + Cassandra is the truth
  • 37.
    AWS DigitalOcean Azure RabbitMQ Long Term Storage DatabaseCacheAnalytics Server Analytics Database Web API
  • 38.
    Step four: Enrichthe IP data
  • 39.
    Analysis: Enrichments • Weneed: • ASN • rDNS • Organization • Country • City • Maxmind is expensive • Neustar is expensive • Ipinfo is CHEAP • Harvesting it yourself is also CHEAP but requires a lot of effort
  • 40.
    Analysis: Enrichments (Lessons) • MISTAKE:Collecting the data yourself is hard and inconsistent and involves a lot of work • LESSON: ARIN has an unauthenticated non rate-limited public API for IP ownership • LESSON: Enrichd • LESSON: Cache rules everything around me
  • 41.
  • 42.
    Step five: Analyzeand categorize/tag the data
  • 43.
    AWS DigitalOcean Azure RabbitMQ Long Term Storage DatabaseCacheAnalytics Server Analytics Database Web API
  • 44.
    Analysis: Analyticsd • Serviceto analyze some time window of data • E.g. past 4 days of data • Catalogue: • Actors • Shodan • Censys • Sonar • Activity • Scanning for SSH • Scanning for Telnet • LESSON: YOU PROBABLY DON’T NEED REAL TIME ANALYTICS • Batch analytics with small time frames • This is why Postgres will often do the trick • LESSON: Only pay attention to activity that has happened on more than one of your nodes • LESSON: You need to know how many nodes are up collecting data at any point in time to properly do a time-series analysis
  • 45.
    AWS DigitalOcean Azure RabbitMQ Long Term Storage DatabaseCacheAnalytics Server Analytics Database Web API
  • 46.
    Step six: Makethe data available
  • 47.
    Consumption: API • WebAPI • Tell me about this IP address • Tell me about this analytic • Github • Search “Grey Noise API” • Github.com/Grey-Noise- Intelligence
  • 48.
    Consumption: Bindings • BobbyFilar: phyler/greynoise • Tek: PyGreyNoise • Bob Rudis: R bindings • Some mystery Go bindings out there
  • 49.
    Consumption: FRONT END •Complete 100% credit to Casey Buto (github.com/cbuto) • Point and click interface • Hosted version at viz.greynoise.io • EXPLORE THE DATA
  • 50.
  • 51.
    Consumption: FRONT END •Complete 100% credit to Casey Buto (github.com/cbuto) • Point and click interface • Hosted version at viz.greynoise.io • EXPLORE THE DATA
  • 52.
    OpSec (Operational Security) •Hard to fingerprint (mostly custom services) • Encrypt everything • No names • Ops domains • Dockerize • Shift infrastructure constantly • Reduce the oracle surface • IO is hard to opsec • Minimum number / node thresholds • Sleep delays
  • 53.
    Cost • AWS: 15regions • $4.75 per box • Total: $71 • Digitalocean: 11 regions • $5 per box • Total: $55 • Google: 36 regions • $4.28 per box • Total: $154 • Total: $400 per month Vultr: 15 regions $5 per box (they advertise $2.50 but they're never available) Total: $75 Linode: 9 regions $5 per box Total: $45
  • 54.
    Cost (notes) • Notes: •No Ops boxes in here (you need these) • This is simply not enough to have complete coverage but it'll give you a good start • You can save money by buying extra IPs, but it complicates engineering
  • 55.
  • 56.
    Analysis • What amI collecting? • Volume Summary • Data Summary • Actor Summary • Benign • Malicious • Unknown??? • Malware Summary • Hall of Shame (Malware-iest regions of the Internet) • WEIRD SHIT • Misc Lessons
  • 57.
    What am Icollecting? • Passive • Iptables – Packets on ports • P0f – passive OS fingerprint • JA3 – SSL fingerprint (stick around!) • Active • HTTP • SSH • Telnet • Experimental • RDP • SIP • SMTP • NTP • TFTP • DNS
  • 58.
    Data Summary • Iptables: •I don’t have a good way to quantify this yet • HTTP: • Lots of ”/”, spoofed user agents, search engines, people looking for Jboss/Wordpress/Tomcat/PHPMyAdmin • SSH + Telnet • Bots. Defaults cred attempts. Nothing new here. • P0f • Lots of OS visibility
  • 59.
    Volume Summary • Withthe aforementioned numbers ($400 worth of servers): • 1M – 2M iptables events per day • 700k – 1M SSH logins per day • 1M – 10M telnet logins per day • 10K – 100K HTTP requests per day • 100-200 messages per second through your queue • ~60K IPs per day • 1GB of raw data, msgpacked + compressed per day
  • 60.
    Actor Summary • Benign: •Shodan: 27 IPs • Censys: 334 IPs • Sonar: 56 IPs • ShadowServer: 228 IPs • IPIP: 63 IPs • BinaryEdge: 253 IPs • PDRLabs: 25 IPs • Pingdom: 9 IPs • ProbeTheNet: 1 IP • NetCraft: 145 IPs • Others • Malicious • Mirai: 249k IPs • SSH Worms: 92k IPs • Popped Routers / residential IPs attacking people: 590k IPs
  • 61.
  • 62.
  • 63.
  • 64.
    Pretenders • Machines advertisingclient banners that are false • Mismatches between user agent, p0f OS fingerprint, and JA3 • Is the browser hitting this HTTP server really running Safari on a Linux kernel 3.1 box? Is it? • Why? Idk
  • 65.
    Dangling DNS • Whenyou spin a bunch of IPs up and down, it’s not uncommon to inherit an IP address from your cloud provider that still has a domain pointing to it. • CDN.whatever123.acme.com • This traffic is dirty, you don’t want it
  • 66.
    “WORM FINDER” • Sometimeswhen Grey Noise observes an IP address scanning for a given TCP port, I’ll turn around and check to see if that port is open on the source machine. • If the answer is yes, this can be a great indicator of a worm • Why else would a computer search for behavior that it also exhibits? • Average lifespan from start to finish is 4 days
  • 67.
    Zmap’s hardcoded IDparameter • Zmap hardcodes all packets it creates with an ID parameter of “54321”, making it trivial to fingerprint • Go to “github.com/zmap/zmap” and search / grep the repository for “54321” • Shoutout Oliver Gasser @ Technical University of Munich
  • 68.
    Still SO MANYWINDOWS WORMS • LOADS of people blasting SMB traffic on TCP port 445 • More and more RDP worms as well, but these aren’t exploiting vulns, just guessing creds • WinRM is next, in my opinion
  • 69.
    People do weirdstuff through proxies • Airline price scraping data (???) • Also testing stolen credentials • And probably credit card numbers • News sites??? This is a huge rabbit hole…
  • 70.
    Lots of robocalls probably come from popped SIP boxes • People try to make calls to India and Russia through open VOIP servers • Like, LOTS of them • Tens of thousands per day
  • 71.
    The things people scanfor through Tor is interesting
  • 72.
    You can neuter/blowup worms by replaying their own traffic back to them • A box is compromised with a Telnet worm • The worm carries a built in wordlist • The compromised box throws the same wordlist at you • You replay the wordlist back to the compromised box • Chances are, depending on the worm, one of those credentials will work
  • 73.
  • 74.
    What does thefuture hold? • Version 1.1 API coming very soon • Integrate with everything • Badass machine learning opportunities • Explore identifying anti-threat intelligence in other areas • Intranet traffic • DMZ traffic • Files on a filesystem
  • 75.
  • 76.
    Conclusion • The Internetis a noisy place • Every packet has a story • It’s possible to collect all of this background noise • If you want to explore the data, hit the API. If the API doesn’t give you what you need, email me or hit me up on Twitter
  • 77.
    Acknowledgements • Phil Maddox(twitter.com/foospidy) • Bobby Filar (twitter.com/filar) • Rich Seymour (twitter.com/rseymour) • Casey Buto (github.com/cbuto) • Bob Rudis (twitter.com/hrbrmstr) • Tek (twitter.com/tenacioustek) • Mickey Perre (twitter.com/MickeyPerre) • Michel Oosterhof (twitter.com/micheloosterhof)
  • 78.
  • 79.

Editor's Notes