SlideShare a Scribd company logo
Erlang as a cloud citizen
    Paolo Negri @hungryblank
1 day at


• 10 millions players
• 2 billions game server requests (http)
• 20 devops people
Cloud
“A cloud is made of billows upon
billows upon billows that look like
clouds.
As you come closer to a cloud you
don't get something smooth, but
irregularities at a smaller scale.”
                   Benoît B. Mandelbrot

                      http://www.flickr.com/photos/nirak/644336486
AWS Cloud
This talk will answer

• Why building a system targeting the
  cloud?
• How many EC2 instances do you
  need to respond to 0.25 billion
  uncacheable game reqs/day?
15 months ago...




           http://www.flickr.com/photos/wheatfields/515068701
1st cloud hosted project,
     lessons learned



1
      Pushing live the 60th application
      server?
      not different from adding the 6th!
      push a button
1st cloud hosted project,
     lessons learned



2
      local network/local disk are low
      performance general purpose
      tools
      (nothing to do with ad hoc data
      center solutions)
1st cloud hosted project,
     lessons learned



3
      Complete automation is cool
      Ease of adding hosts/automation
      can lead to bloated infrastructure
1st cloud hosted project,
      points of pain
 • A lot of inefficient app servers (as per
   tweets)
 • Much effort to scale up/maintain
   databases (mySQL & Redis)
 • Expensive, not crazy expensive, but
   expensive
Why trying again?




          http://www.flickr.com/photos/kky/704056791/
Uncertainity
• will we reach100K or 3 millions users?
• 3 million users in 2 weeks or 12 months?
• cheat tool released 1h ago => single game
  call up 5000%
• weekly releases, new feature performance
  impact?
the cloud

• standard units (instances) of
  computing capacity
• a network connecting all instances
• an API to provision/dismiss instances
the cloud

Sounds like a good framework to
compose computing capacity


Why didn’t work as a framework
to compose throughput?
Scaling in the cloud
            the recipe
CLOUD: composable units of computing capacity

                      +
DEVELOPER: turn a unit of computing capacity in
            a unit of throughput
                      =
composable throughput, a plan for scaling BIG
turn a unit of
                                         computing capacity
                                            in a unit of
http://www.flickr.com/photos/pasukaru76
                                            throughput
Unit of throughput,
      where?

      App server


       Database
Unit of throughput,
             joke?
App server    App server    App server     App server


                        Cache


             Database           Database
Unit of throughput?


      App server


       Database



     No unit!
Unit of throughput?


          App server


           Database



Tightly coupled throughput?
Unit of throughput?


        App server


        Database



Monolithic throughput?
Monolithic throughput
Monolithic throughput
• likes monolithic infrastructure!
• scales well vertically
• wants screaming fast stack (network,
  disks...)
• any performance glitch impacts the
  whole system
Tightly coupled throughput
               +
loosely coupled hardware
         (like cloud)
               =
         frustration
Who leads the tightly
 coupled dance?

       App server


        Database
Who leads the tightly
 coupled dance?

       App server


        Database

   The stateless
 application server!
Stateless application
 servers guarantee
     one thing...

      which?
Data is never
where you need it
And another one...
If you can feed them
  data fast enough...

  they’ll choke on
 garbage collection
We measure memcache
    HIT / MISS

why app servers need
 to be 100% MISS?
Where’s the best
knowledge about hot/
     cold data?
Even the reverse makes
      more sense

  Database    1. pick your data up
              2. go in the stateless
 App server
                 app server
What
Went
Wrong?
He can tell you!
        • Rich Hickey
        • Clojure author
         “...If not in Erlang which I
         think has a complete story
         for how they do state”[1]
         [1] Value Identity State @0.27

         http://goo.gl/Zdjv0


                     http://www.flickr.com/photos/ghoseb/5120173586
Most languages and runtimes
don’t have a safe solution for
 concurrent, long lived state

  Erlang stands out as an
exception in this panorama
Erlang...

 Processes are the primary
means to structure an Erlang
        application.

                     wikipedia
Erlang + OTP
 Generic Server Behaviour

 A generic server process
(gen_server) implemented
    using this module...

         otp documentation
Erlang + OTP
 Generic Server Behaviour
handle_call(_Request, _From, State) ->
    {reply, ignored, State}.

handle_cast(_Msg, State) ->
    {noreply, State}.

handle_info(_Info, State) ->
    {noreply, State}.

terminate(_Reason, _State) ->
    ok.

code_change(_OldVsn, State, _Extra) ->
    {ok, State}.
gen_server

  • An erlang process
  • With LOCAL state
  • responding to requests from clients

A unit of throughput!
Composing throughput
    with erlang


gen_server
             +   EC2 instance
Composing throughput
      with erlang

                   EC2 instance




         1 EC2 instance + 1 erlang VM
                      =
N kilo gen_servers (N kilo units of throughput)
Composing throughput
      with erlang

                   EC2 instance




         1 EC2 instance + 1 erlang VM
                      =
N kilo gen_servers (N kilo units of throughput)
Composing throughput
      with erlang

                   EC2 instance




         1 EC2 instance + 1 erlang VM
                      =
N kilo gen_servers (N kilo units of throughput)
Composing throughput
      with erlang

                   EC2 instance




         1 EC2 instance + 1 erlang VM
                      =
N kilo gen_servers (N kilo units of throughput)
Composing throughput
      with erlang

                   EC2 instance




         1 EC2 instance + 1 erlang VM
                      =
N kilo gen_servers (N kilo units of throughput)
Scale by adding units
      instances
Scale up adding units
      instances
Erlang distribution
Throughput complexity


                       VS.



• Losely coupled peers       • Tightly coupled roles
• Independent throughput     • Dependent throughput
Where does the state
     come from?
             Start



                     Database
gen_server

             Stop
Now with database


                AWS
                 S3
Database scalability
DB is (almost) never on
latency critical path


                             AWS
                              S3


No need for low latency DB
Database scalability

Throughput required is low


                             AWS
                              S3


 We can approximate S3
  capacity as infinite
Database scalability


Ubiquitous and uniform from
application servers point of
view.                          AWS
                                S3
Remember?


            AWS
             S3
How it actually works
                 Data from
                 the S3 is
                 uniformly
                 available to
                 any ec2
                 instance
And as you zoom in...
And zoom in...
And zoom in you see...


    EC2 instance
Always the same
  kind of structure

A fractal approach to
     throughput
Fractal

“A cauliflower shows how an object
can be made of many parts, each of
which is like a whole, but smaller.”
                  Benoît B. Mandelbrot



                 http://www.flickr.com/photos/paulobrabo/3588387063
Homework

The exact same solution might
     not work for you...

   but look for that unit of
         throughput
Answer time
           You need
            XXXX
       Smallish instances
         to serve 0.25
billions uncacheable reqs/day
Answer time
           You need
             0XXX
       Smallish instances
         to serve 0.25
billions uncacheable reqs/day
Answer time
           You need
             00XX
       Smallish instances
         to serve 0.25
billions uncacheable reqs/day
Answer time
           You need
              0012
       Smallish instances
         to serve 0.25
billions uncacheable reqs/day
What’s
1200 ?
Thanks
Paolo Negri @hungryblank
http://www.wooga.com/jobs

More Related Content

PDF
Erlang factory 2011 london
PDF
Erlang factory SF 2011 "Erlang and the big switch in social games"
PDF
Combining the strength of erlang and Ruby
PDF
FunctionalConf '16 Robert Virding Erlang Ecosystem
PDF
Architecture Evolution at Wooga (AWS Cloud Computing for Developers,)
PPT
Using Simplicity to Make Hard Big Data Problems Easy
PPS
Storm presentation
PDF
Analysis big data by use php with storm
Erlang factory 2011 london
Erlang factory SF 2011 "Erlang and the big switch in social games"
Combining the strength of erlang and Ruby
FunctionalConf '16 Robert Virding Erlang Ecosystem
Architecture Evolution at Wooga (AWS Cloud Computing for Developers,)
Using Simplicity to Make Hard Big Data Problems Easy
Storm presentation
Analysis big data by use php with storm

What's hot (13)

PDF
Learning Stream Processing with Apache Storm
PPTX
Introduction to Storm
PDF
Storm
PDF
Exactly-once Semantics in Apache Kafka
PDF
Benchmarking at Parse
PDF
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
PDF
Akka-chan's Survival Guide for the Streaming World
PPTX
Apache Storm
PPTX
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
PDF
Introduction to Apache Storm - Concept & Example
PDF
Webinar: Queues with RabbitMQ - Lorna Mitchell
PDF
The inherent complexity of stream processing
PDF
Apache Storm
Learning Stream Processing with Apache Storm
Introduction to Storm
Storm
Exactly-once Semantics in Apache Kafka
Benchmarking at Parse
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Akka-chan's Survival Guide for the Streaming World
Apache Storm
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Introduction to Apache Storm - Concept & Example
Webinar: Queues with RabbitMQ - Lorna Mitchell
The inherent complexity of stream processing
Apache Storm
Ad

Viewers also liked (20)

PDF
Mongrel2, a short introduction
PDF
SimpleDb, an introduction
PDF
Electron - Solving our cross platform dreams?
PPTX
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
PDF
Scaling Social Games
PPT
Why you should come to DrupalSouth
PPTX
Entrez dans le mouvement Maker à l’aide des technologies Microsoft
PDF
A Documentation Crash Course, LinuxCon 2016
PDF
Offre développeur Javascript Back-end
PDF
Erlang introduction geek2geek Berlin
PPTX
Contentful Berlin Offices
PDF
Automate your docs, automate yourself
PDF
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...
PDF
Hyperdex - A closer look
PDF
Chloe and the Realtime Web
KEY
Brunch With Coffee
PDF
Riak Search - Erlang Factory London 2010
PDF
Blazes: coordination analysis for distributed programs
PDF
LXC, Docker, and the future of software delivery | LinuxCon 2013
PDF
AWS Lambda in infrastructure
Mongrel2, a short introduction
SimpleDb, an introduction
Electron - Solving our cross platform dreams?
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Scaling Social Games
Why you should come to DrupalSouth
Entrez dans le mouvement Maker à l’aide des technologies Microsoft
A Documentation Crash Course, LinuxCon 2016
Offre développeur Javascript Back-end
Erlang introduction geek2geek Berlin
Contentful Berlin Offices
Automate your docs, automate yourself
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...
Hyperdex - A closer look
Chloe and the Realtime Web
Brunch With Coffee
Riak Search - Erlang Factory London 2010
Blazes: coordination analysis for distributed programs
LXC, Docker, and the future of software delivery | LinuxCon 2013
AWS Lambda in infrastructure
Ad

Similar to Erlang as a cloud citizen, a fractal approach to throughput (20)

PPTX
Using Kubernetes to deliver a “serverless” service
PPTX
Using Kubernetes to deliver a “serverless” service
PPTX
Scaling a MeteorJS SaaS app on AWS
PPTX
Scaling on AWS to the First 10 Million Users
PPTX
Architecting Scalable Applications in the Cloud
PPTX
The impact of cloud NSBCon NY by Yves Goeleven
PDF
Anton Boyko "The future of serverless computing"
PPTX
Immutable Infrastructure: the new App Deployment
PPT
Architecture Best Practices on Windows Azure
PPTX
Matt Franklin - Apache Software (Geekfest)
PDF
Microservices: moving parts around
PPTX
CLOUD COMPUTING AWS SERVICESUnit 2 Part 2.pptx
PPTX
Anton Boyko, "The evolution of microservices platform or marketing gibberish"
PDF
Satrtup Bootcamp - Scale on AWS
PPTX
Cloud computing & lamp applications
PPTX
Extending on premise applications to the cloud
PPTX
Introduction to Kubernetes
PPT
Elatt Presentation
PPT
Startups In The Cloud
PDF
Run Cloud Native MySQL NDB Cluster in Kubernetes
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” service
Scaling a MeteorJS SaaS app on AWS
Scaling on AWS to the First 10 Million Users
Architecting Scalable Applications in the Cloud
The impact of cloud NSBCon NY by Yves Goeleven
Anton Boyko "The future of serverless computing"
Immutable Infrastructure: the new App Deployment
Architecture Best Practices on Windows Azure
Matt Franklin - Apache Software (Geekfest)
Microservices: moving parts around
CLOUD COMPUTING AWS SERVICESUnit 2 Part 2.pptx
Anton Boyko, "The evolution of microservices platform or marketing gibberish"
Satrtup Bootcamp - Scale on AWS
Cloud computing & lamp applications
Extending on premise applications to the cloud
Introduction to Kubernetes
Elatt Presentation
Startups In The Cloud
Run Cloud Native MySQL NDB Cluster in Kubernetes

Recently uploaded (20)

PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Chapter 2 Digital Image Fundamentals.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
REPORT: Heating appliances market in Poland 2024
PDF
Modernizing your data center with Dell and AMD
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
PPTX
CroxyProxy Instagram Access id login.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
How AI Agents Improve Data Accuracy and Consistency in Due Diligence.pdf
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
Reimagining Insurance: Connected Data for Confident Decisions.pdf
Transforming Manufacturing operations through Intelligent Integrations
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Chapter 2 Digital Image Fundamentals.pdf
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
REPORT: Heating appliances market in Poland 2024
Modernizing your data center with Dell and AMD
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
A Day in the Life of Location Data - Turning Where into How.pdf
CroxyProxy Instagram Access id login.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
How AI Agents Improve Data Accuracy and Consistency in Due Diligence.pdf
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...

Erlang as a cloud citizen, a fractal approach to throughput

  • 1. Erlang as a cloud citizen Paolo Negri @hungryblank
  • 2. 1 day at • 10 millions players • 2 billions game server requests (http) • 20 devops people
  • 3. Cloud “A cloud is made of billows upon billows upon billows that look like clouds. As you come closer to a cloud you don't get something smooth, but irregularities at a smaller scale.” Benoît B. Mandelbrot http://www.flickr.com/photos/nirak/644336486
  • 5. This talk will answer • Why building a system targeting the cloud? • How many EC2 instances do you need to respond to 0.25 billion uncacheable game reqs/day?
  • 6. 15 months ago... http://www.flickr.com/photos/wheatfields/515068701
  • 7. 1st cloud hosted project, lessons learned 1 Pushing live the 60th application server? not different from adding the 6th! push a button
  • 8. 1st cloud hosted project, lessons learned 2 local network/local disk are low performance general purpose tools (nothing to do with ad hoc data center solutions)
  • 9. 1st cloud hosted project, lessons learned 3 Complete automation is cool Ease of adding hosts/automation can lead to bloated infrastructure
  • 10. 1st cloud hosted project, points of pain • A lot of inefficient app servers (as per tweets) • Much effort to scale up/maintain databases (mySQL & Redis) • Expensive, not crazy expensive, but expensive
  • 11. Why trying again? http://www.flickr.com/photos/kky/704056791/
  • 12. Uncertainity • will we reach100K or 3 millions users? • 3 million users in 2 weeks or 12 months? • cheat tool released 1h ago => single game call up 5000% • weekly releases, new feature performance impact?
  • 13. the cloud • standard units (instances) of computing capacity • a network connecting all instances • an API to provision/dismiss instances
  • 14. the cloud Sounds like a good framework to compose computing capacity Why didn’t work as a framework to compose throughput?
  • 15. Scaling in the cloud the recipe CLOUD: composable units of computing capacity + DEVELOPER: turn a unit of computing capacity in a unit of throughput = composable throughput, a plan for scaling BIG
  • 16. turn a unit of computing capacity in a unit of http://www.flickr.com/photos/pasukaru76 throughput
  • 17. Unit of throughput, where? App server Database
  • 18. Unit of throughput, joke? App server App server App server App server Cache Database Database
  • 19. Unit of throughput? App server Database No unit!
  • 20. Unit of throughput? App server Database Tightly coupled throughput?
  • 21. Unit of throughput? App server Database Monolithic throughput?
  • 23. Monolithic throughput • likes monolithic infrastructure! • scales well vertically • wants screaming fast stack (network, disks...) • any performance glitch impacts the whole system
  • 24. Tightly coupled throughput + loosely coupled hardware (like cloud) = frustration
  • 25. Who leads the tightly coupled dance? App server Database
  • 26. Who leads the tightly coupled dance? App server Database The stateless application server!
  • 27. Stateless application servers guarantee one thing... which?
  • 28. Data is never where you need it
  • 30. If you can feed them data fast enough... they’ll choke on garbage collection
  • 31. We measure memcache HIT / MISS why app servers need to be 100% MISS?
  • 32. Where’s the best knowledge about hot/ cold data?
  • 33. Even the reverse makes more sense Database 1. pick your data up 2. go in the stateless App server app server
  • 35. He can tell you! • Rich Hickey • Clojure author “...If not in Erlang which I think has a complete story for how they do state”[1] [1] Value Identity State @0.27 http://goo.gl/Zdjv0 http://www.flickr.com/photos/ghoseb/5120173586
  • 36. Most languages and runtimes don’t have a safe solution for concurrent, long lived state Erlang stands out as an exception in this panorama
  • 37. Erlang... Processes are the primary means to structure an Erlang application. wikipedia
  • 38. Erlang + OTP Generic Server Behaviour A generic server process (gen_server) implemented using this module... otp documentation
  • 39. Erlang + OTP Generic Server Behaviour handle_call(_Request, _From, State) ->     {reply, ignored, State}. handle_cast(_Msg, State) ->     {noreply, State}. handle_info(_Info, State) ->     {noreply, State}. terminate(_Reason, _State) ->     ok. code_change(_OldVsn, State, _Extra) ->     {ok, State}.
  • 40. gen_server • An erlang process • With LOCAL state • responding to requests from clients A unit of throughput!
  • 41. Composing throughput with erlang gen_server + EC2 instance
  • 42. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM = N kilo gen_servers (N kilo units of throughput)
  • 43. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM = N kilo gen_servers (N kilo units of throughput)
  • 44. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM = N kilo gen_servers (N kilo units of throughput)
  • 45. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM = N kilo gen_servers (N kilo units of throughput)
  • 46. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM = N kilo gen_servers (N kilo units of throughput)
  • 47. Scale by adding units instances
  • 48. Scale up adding units instances
  • 50. Throughput complexity VS. • Losely coupled peers • Tightly coupled roles • Independent throughput • Dependent throughput
  • 51. Where does the state come from? Start Database gen_server Stop
  • 53. Database scalability DB is (almost) never on latency critical path AWS S3 No need for low latency DB
  • 54. Database scalability Throughput required is low AWS S3 We can approximate S3 capacity as infinite
  • 55. Database scalability Ubiquitous and uniform from application servers point of view. AWS S3
  • 56. Remember? AWS S3
  • 57. How it actually works Data from the S3 is uniformly available to any ec2 instance
  • 58. And as you zoom in...
  • 60. And zoom in you see... EC2 instance
  • 61. Always the same kind of structure A fractal approach to throughput
  • 62. Fractal “A cauliflower shows how an object can be made of many parts, each of which is like a whole, but smaller.” Benoît B. Mandelbrot http://www.flickr.com/photos/paulobrabo/3588387063
  • 63. Homework The exact same solution might not work for you... but look for that unit of throughput
  • 64. Answer time You need XXXX Smallish instances to serve 0.25 billions uncacheable reqs/day
  • 65. Answer time You need 0XXX Smallish instances to serve 0.25 billions uncacheable reqs/day
  • 66. Answer time You need 00XX Smallish instances to serve 0.25 billions uncacheable reqs/day
  • 67. Answer time You need 0012 Smallish instances to serve 0.25 billions uncacheable reqs/day