0% found this document useful (0 votes)

57 views

Building Scalable Web Architectures: Aaron Bannert

Uploaded by

quanna84

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

Building Scalable Web Architectures: Aaron Bannert

Uploaded by

quanna84

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 74

Building

Scalable Web Architectures

Aaron Bannert
[email protected] / [email protected]

QuickTime™ and a
http://www.codemass.com/~aaron/presentations/ TIFF (Uncompressed)
are needed to see this decomp
picture
apachecon2005/ac2005scalablewebarch.ppt
Goal

 How do we build a massive web system

out of commodity parts and open source?
Agenda

1. LAMP Overview

2. LAMP Features

3. Performance

4. Surviving your first Slashdotting

a. Growing above a single box

b. Avoiding Bottlenecks
LAMP Overview
Architecture
LAMP

Linux
Apache
MySQL
PHP (Perl?)
The Big Picture
External Caching Tier
External Caching Tier

 What is this?
 Squid

 Apache’s mod_proxy
 Commercial HTTP Accelerator
External Caching Tier

 What does it do?

 Caches outbound HTTP objects
 Images, CSS, XML, HTML, etc…

 Flushes Connections
 Useful for modem users, frees up web tier

 Denial of Service Defense

External Caching Tier

 Hardware Requirements
 Lots of Memory
 Fast Network
 Moderate to little CPU
 Moderate Disk Capacity
 Room for cache, logs, etc… (disks are cheap)
 One slow disk is OK

 Two Cheapies > One Expensive

External Caching Tier

 Other Questions
 What to cache?
 How much to cache?
 Where to cache (internal vs. external)?
Web Serving Tier
Web Serving Tier

 What is this?
 Apache

 thttpd

 Tux Web Server

 IIS

 Netscape
Web Serving Tier

 What does it do?

 HTTP, HTTPS
 Serves Static Content from disk
 Generates Dynamic Content
 CGI/PHP/Python/mod_perl/etc…
 Dispatches requests to the App Server Tier
 Tomcat, Weblogic, Websphere, JRun, etc…
Web Serving Tier

 Hardware Requirements
 Lots and lots of Memory
 Memory is main bottleneck in web serving
 Memory determines max number of users
 Fast Network
 CPU depends on usage
 Dynamic content needs CPU
 Static file serving requires very little CPU
 Cheap slow disk, enough to hold your content
Web Serving Tier

 Choices
 How much dynamic content?
 When to offload dynamic processing?
 When to offload database operations?
 When to add more web servers?
Application Server Tier
Application Server Tier
 What does it do?
 Dynamic Page Processing
 JSP
 Servlets

 Standalone mod_perl/PHP/Python engines

 Internal Services
 Eg. Search, Shopping Cart, Credit Card Processing
Application Server Tier
• How does it work?
1. Web Tier generates the request using
 HTTP (aka “REST”, sortof)
 RPC/Corba
 Java RMI
 XMLRPC/Soap
 (or something homebrewed)
2. App Server processes request and responds
Application Server Tier

 Caveats
 Decoupling of services is GOOD
 Manage Complexity using well-defined APIs
 Don’t decouple for scaling, change your algorithms!
 Remote Calling overhead can be expensive
 Marshaling of data
 Sockets, net latency, throughput constraints…
 XML, Soap, XMLRPC, yuck (don’t scale well)
 Better to use Java’s RMI, good old RPC or even Corba
Application Server Tier

 More Caveats
 Remote Calling introduces new failure scenarios

 Classic Distributed Problems

• How to detect remote failures?
 How long to wait until deciding it’s failed?
• How to react to remote failures?
 What do we do when all app servers have failed?
Application Server Tier

 Hardware Requirements
 Lots and Lots and Lots of Memory
 App Servers are very memory hungry
 Java was hungry to being with
 Consider going to 64bit for larger memory-space
 Disk depends on application, typically minimal needed
 FAST CPU required, and lots of them
 (This will be an expensive machine.)
Database Tier
Database Tier
 Available DB Products
 Free/Open Source DBs
 PostgreSQL  MySQL
 GNU DBM
 SQLite
 Ingres
 mSQL
 SQLite
 Berkeley DB

 Commercial
 Oracle
 MS SQL
 IBM DB2
 Sybase
 SleepyCat
Database Tier
 What does it do?
 Data Storage and Retrieval
 Data Aggregation and Computation

 Sorting

 Filtering

 ACID properties
 (Atomic, Consistent, Isolated, Durable)
Database Tier
 Choices
 How much logic to place inside the DB?
 Use Connection Pooling?

 Data Partitioning?
 Spreading a dataset across multiple logical database
“slices” in order to achieve better performance.
Database Tier
 Hardware Requirements
 Entirely dependent upon application.
 Likely to be your most expensive machine(s).

 Tons of Memory
 Spindles galore
 RAID is useful (in software or hardware)
 Reliability usually trumps Speed
• RAID levels 0, 5, 1+0, and 5+0 are useful
 CPU also important
 Dual power supplies
 Dual Network
Internal Cache Tier
Internal Cache Tier
 What is this?
 Object Cache
 What Applications?
 Memcache

 Local Lookup Tables

 BDB, GDBM, SQL-based
 Application-local
Caching (eg. LRU tables)
 Homebrew Caching (disk or memory)
Internal Cache Tier
 What does it do?
 Caches objects closer to the
Application or Web Tiers
 Tuned for your application

 Very Fast Access

 Scales Horizontally
Internal Cache Tier
 Hardware Requirements
 Lots of Memory
 Note that 32bit processes are typically limited to 2GB
of RAM
 Little
or no disk
 Moderate to low CPU

 Fast Network
Misc. Services (DNS, Mail, etc…)
Misc. Services (DNS, Mail, etc…)

 Why mention these?

 Every LAMP system has them
 Crucial but often overlooked
 Source of hidden problems
Misc. Services: DNS

 Important Points
 Always have an offsite NS slave
 Always have an onsite NS slave
 Minimize network latency
 Don’t use NAT, load balancers, etc…
Misc. Services: Time Synchronization

 Synchronize the clocks on your systems!

 Hints:
 Use NTPDATE at boot time to set clock
 Use NTPD to stay in synch
 Don’t ever change the clock on a running
system!
Misc. Services: Monitoring

 System Health Monitoring

 Nagios

 Big Brother
 Orcalator

 Ganglia

 Fault Notification
The Glue

•Routers
•Switches
•Firewalls
•Load Balancers
Routers and Switches
 Expensive
 Complex
 Crucial Piece of the System

 Hints
 Use GigE if you can
 Jumbo Frames are GOOD
 VLans to manage complexity
 LACP (802.3ad) for failover/redundancy
Load Balancers
 What services to balance?
 HTTP Caches and Servers, App Servers, DB
Slaves
 What NOT to balance?
 DNS
 LDAP
 NIS
 Memcache
 Spread
 Anything with it’s own built-in balancing
Message Busses
 What is out there?
 Spread
 JMS
 MQSeries
 Tibco Rendezvous

 What does it do?

 Various forms of distributed message delivery.
 Guaranteed Delivery, Broadcasting, etc…
 Useful for heterogeneous distributed systems
What about the OS?
Operating System Selection
Lots of OS choices
 Linux
 FreeBSD
 NetBSD
 OpenBSD
 OpenSolaris
 Commercial Unix
What’s Important?
 Maintainability
 Upgrade Path
 Security Updates
 Bug Fixes

 Usability
 Do your engineers like it?
 Cost
 Hardware Requirements
 (you don’t need a commercial Unix anymore)
Features to look for

 Multi-processor Support
 64bit Capable
 Mature Thread Support
 Vibrant User Community
 Support for your devices
The Age of LAMP
What does LAMP provide?
Scalability

 Grows in small steps

 Stays up when it counts

 Can grow with your traffic

 Room for the future

Reliability

 High Quality of Service

 Minimal Downtime
 Stability
 Redundancy
 Resilience
Low Cost

 Little or no software licensing costs

 Minimal hardware requirements

 Abundance of talent

 Reduced maintenance costs

Flexible

 Modular Components
 Public APIs
 Open Architecture
 Vendor Neutral
 Many options at all levels
Extendable
 Free/Open Source Licensing
 Right to Use
 Right to Inspect

 Right to Improve

 Plugins
 Some Free
 Some Commercial

 Can always customize

Free as in Beer?

Price
Speed
Quality

Pick any two.

Performance
What is Performance?

 For LAMP?

 Improving the User Experience

Architecture affects user experience?

 It affects it in two ways

 Speed
Fast Page Loads (Latency)
 Availability
Uptime
Problem: Concurrency

 Concurrency causes

slowdowns

 Latency suffers

 Pages load more slowly

Solution: Design for Concurrency

 Build parallel systems

 Eliminate bottlenecks

 Aim for horizontal scalability

Now for some real-world examples…

Surviving your first
Slashdotting
Strategies for Scalability
What is a “Slashdotting”?
 Massive traffic spike (1000x normal)
 High bandwidth needed
 VERY high concurrency needed
 Site inaccessible to some users
 If your system crashes, nobody gets in
Approach

1. Keep the system up, no crashing

 Some users are better than none

2. Let as many users in as possible

Strategies

1. Load Balancers (of course)

2. Static File Servers

3. External Caching
Load Balancers

 Hardware vs. Software

 Software is complex to set up, but cheaper
 Hardware is expensive, but dedicated
 IMHO: Use SW at first, graduate to HW
Static File Servers: Zero-copy

 Separate Static from Dynamic

 Scale them independently
 Later, dedicate static content serers
 Modern web servers are very good at serving static
content such as
• HTML
• CSS
• Images
• Zip/GZ/Tar files
External Caching
 Reduces internal load
 Scales horizontally
 Obeys HTTP-defined rules for caching
 Your app defines the caching headers
 Behaves no differently than other proxy servers
or your browser, only it’s dedicated

 Hint: Use mod_expires to override

Outgrowing your First Server
Strategies for Growth
Design for Horizontal Scalability

 Manage Complexity

 Design Stateless Systems (hard)

 Identify Bottlenecks (hard)

 Predict Growth

 Commodity Parts
Manage Complexity
 Decouple internal services
 Servicesscale independently
 Independent maintenance

 Well-defined APIs
 Facilitates
service decoupling
 Scales your engineering efforts
What is a Stateless System?

 Each connection hits a new server

 Server remembers nothing

 Benefits?
 Allows Better Caching
 Scales Horizontally
Designing Stateless Systems
 Decouple session state from web server
 Store session state in a DB
 Careful: may just move bottleneck to DB tier
 Use a distributed internal cache
 Memcached
 Reduces pressure on session database
Example: Scaling your User DB
 Assume you have a user-centric system
 Eg. User identity info, subscriptions, etc…

1. Group data by user

2. Distribute users across multiple DBs
3. Write a directory to track user->DB location
4. Cache user data to reduce DB round trips

Disadvantage: difficult to compare two users

Identify Bottlenecks
 Monitor your system performance
 Use tools for this, there are many
 Store and plot historical data
 Used to identify abnormalities
 Check out rrdtool
 Use system tools to troubleshoot
 vmstat, iostat, sar, top, strace, etc…
Predict Growth
 Use performance metrics
 Hits/sec

 Concurrent connections
 System load

 Total number of users

 Database table rows

 Database index size (on disk/memory)

…
Machine-sized Solutions
 Design for last year’s hardware
 (it’s cheaper)
 Leaves room for your software to grow
 Hardware will get faster
 And your systems will get busier
Use Commodity Parts

 Standardize Hardware

 Use Commodity Software

(Open Source!)

 Avoid Fads
THE END
Thank You

IBM Security QRadar XDR Fundamentals Level 1quiz - Attempt Review
100% (1)
IBM Security QRadar XDR Fundamentals Level 1quiz - Attempt Review
13 pages
How To Design A System To Scale To Your First 100 Million Users - by Anh T. Dang - Level Up Coding
No ratings yet
How To Design A System To Scale To Your First 100 Million Users - by Anh T. Dang - Level Up Coding
34 pages
Building Scalable Web Sites
No ratings yet
Building Scalable Web Sites
21 pages
Real World Web: Performance & Scalability
100% (26)
Real World Web: Performance & Scalability
189 pages
PHP Microservices
From Everand
PHP Microservices
Carlos Pérez Sánchez
3/5 (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Building Scalable Web Architectures: Aaron Bannert
No ratings yet
Building Scalable Web Architectures: Aaron Bannert
75 pages
Ac 2005 Scalable We Barch
No ratings yet
Ac 2005 Scalable We Barch
74 pages
Rwws Mysql 2006
No ratings yet
Rwws Mysql 2006
73 pages
System Design Cheat Sheet
No ratings yet
System Design Cheat Sheet
6 pages
S11 - System Architecture
No ratings yet
S11 - System Architecture
79 pages
CPSC 410: Client/Server Software Architecture: Trevor Young September 25, 2003
No ratings yet
CPSC 410: Client/Server Software Architecture: Trevor Young September 25, 2003
39 pages
3 Web Application Architecture
No ratings yet
3 Web Application Architecture
23 pages
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
100% (10)
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
34 pages
Lamp Technology
100% (1)
Lamp Technology
13 pages
Lessons Learned Building A Web 2.0 Application Using Mysql
100% (3)
Lessons Learned Building A Web 2.0 Application Using Mysql
50 pages
MySQLConf2007 Capacity
No ratings yet
MySQLConf2007 Capacity
54 pages
Distributed Computing With The Bea Weblogic Server: Dean Jacobs Architect Bea Systems
No ratings yet
Distributed Computing With The Bea Weblogic Server: Dean Jacobs Architect Bea Systems
31 pages
Web Scalability - Part - 2
100% (2)
Web Scalability - Part - 2
25 pages
CS-422 Enterprise Computing Preview
No ratings yet
CS-422 Enterprise Computing Preview
58 pages
Distributed IS
No ratings yet
Distributed IS
47 pages
Scale From Zero To Millions of Users
No ratings yet
Scale From Zero To Millions of Users
40 pages
TEXT BOOK:"Client/Server Survival Guide" Wiley INDIA Publication, 3 Edition, 2011. Prepared By: B.Loganathan
No ratings yet
TEXT BOOK:"Client/Server Survival Guide" Wiley INDIA Publication, 3 Edition, 2011. Prepared By: B.Loganathan
41 pages
Introduction To Server Client
No ratings yet
Introduction To Server Client
21 pages
Session 11 Under The Hood of A Commercial Website: 15.561 Information Technology Essentials
No ratings yet
Session 11 Under The Hood of A Commercial Website: 15.561 Information Technology Essentials
29 pages
CLIENT SERVER BUILDING BLOCKS
No ratings yet
CLIENT SERVER BUILDING BLOCKS
74 pages
Chap 8
No ratings yet
Chap 8
20 pages
Web Server Hardware and Software
No ratings yet
Web Server Hardware and Software
20 pages
Changing Landscape of Middleware: Websphere Lab Jam
No ratings yet
Changing Landscape of Middleware: Websphere Lab Jam
15 pages
YouTube Architecture Dmvdivc90jj5hh1a9
No ratings yet
YouTube Architecture Dmvdivc90jj5hh1a9
5 pages
Tech Stack
No ratings yet
Tech Stack
54 pages
Visual Basic Chapter 1 Bharathiyar University
No ratings yet
Visual Basic Chapter 1 Bharathiyar University
10 pages
Ebay Architecture: Scalability With Agility
No ratings yet
Ebay Architecture: Scalability With Agility
46 pages
CS491A Project Final Report Web Galaxy
No ratings yet
CS491A Project Final Report Web Galaxy
25 pages
L5-LargeScaleWebApps
No ratings yet
L5-LargeScaleWebApps
22 pages
WEB 2topic Outline
No ratings yet
WEB 2topic Outline
12 pages
Web Applications (1st Part)
No ratings yet
Web Applications (1st Part)
72 pages
Howto Serve 2500 Ad Requests / Second
No ratings yet
Howto Serve 2500 Ad Requests / Second
54 pages
Emerging Trends
No ratings yet
Emerging Trends
33 pages
Web Design & Development
No ratings yet
Web Design & Development
37 pages
Adbms: Concepts and Architectures: Unit I
No ratings yet
Adbms: Concepts and Architectures: Unit I
41 pages
Notes On N-Tier Architectures: Ye Wu SWE 642 Software Engineering For The World Wide Web
No ratings yet
Notes On N-Tier Architectures: Ye Wu SWE 642 Software Engineering For The World Wide Web
7 pages
Chap 2.client Server Computing (Ch1,2)
No ratings yet
Chap 2.client Server Computing (Ch1,2)
58 pages
Web Technologies: Guieb, Rose Alva Llaneta, Caroline Lopez, Claire Ann Mariano, Nelle Angeline Valbuena, Francis Eric
No ratings yet
Web Technologies: Guieb, Rose Alva Llaneta, Caroline Lopez, Claire Ann Mariano, Nelle Angeline Valbuena, Francis Eric
18 pages
PT 06 - Client Server Architecture
No ratings yet
PT 06 - Client Server Architecture
5 pages
Client Computing Evolution
No ratings yet
Client Computing Evolution
37 pages
Web Application
No ratings yet
Web Application
13 pages
Youtube Architecture
No ratings yet
Youtube Architecture
25 pages
Relayd and Httpd Mastery: IT Mastery, #11
From Everand
Relayd and Httpd Mastery: IT Mastery, #11
Michael W. Lucas
No ratings yet
Lecture No 2
No ratings yet
Lecture No 2
18 pages
SD Blueprint Merged
No ratings yet
SD Blueprint Merged
160 pages
1. What is the architecture of a computer system (1)
No ratings yet
1. What is the architecture of a computer system (1)
6 pages
UECS3223-Lecture_1-Server_technology
No ratings yet
UECS3223-Lecture_1-Server_technology
67 pages
How To Scale: George Palmer
No ratings yet
How To Scale: George Palmer
25 pages
Scaling Memcache at Facebook - Slides
No ratings yet
Scaling Memcache at Facebook - Slides
28 pages
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
Web Technology DBMSs
No ratings yet
Web Technology DBMSs
61 pages
Microsoft Hyper-V Cluster Design
From Everand
Microsoft Hyper-V Cluster Design
Eric Siron
No ratings yet
Apache Toamcat Installation
No ratings yet
Apache Toamcat Installation
58 pages
Amazon SimpleDB: LITE
From Everand
Amazon SimpleDB: LITE
Prabhakar Chaganti
No ratings yet
Information Technology HandBook
From Everand
Information Technology HandBook
Duong Tran
3/5 (1)
3a.huffman Encoding
No ratings yet
3a.huffman Encoding
4 pages
Milesight UC300 Quickstart Guide (EN)
No ratings yet
Milesight UC300 Quickstart Guide (EN)
3 pages
Data Center
No ratings yet
Data Center
8 pages
Instruction Manual Template 12
No ratings yet
Instruction Manual Template 12
4 pages
Proactive Support Level 2 Quiz_ Attempt Review Sep 2024
No ratings yet
Proactive Support Level 2 Quiz_ Attempt Review Sep 2024
16 pages
User Manual P400xi
No ratings yet
User Manual P400xi
51 pages
Automation Attendance Systems Approaches: A Practical Review
No ratings yet
Automation Attendance Systems Approaches: A Practical Review
9 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
Smart Irrigation System
No ratings yet
Smart Irrigation System
7 pages
Compiler Design
No ratings yet
Compiler Design
15 pages
Unit Ii Getting Started With Pandas
No ratings yet
Unit Ii Getting Started With Pandas
35 pages
2022 Book HCIInternational2022Posters
No ratings yet
2022 Book HCIInternational2022Posters
538 pages
6.5
No ratings yet
6.5
1 page
Intel 8086 Instruction Format: Electrical and Electronic Engineering
No ratings yet
Intel 8086 Instruction Format: Electrical and Electronic Engineering
4 pages
Sophos
No ratings yet
Sophos
4 pages
Rishabh Tyagi Resume
No ratings yet
Rishabh Tyagi Resume
1 page
Week 1 - Introduction To Computer Networks Module
No ratings yet
Week 1 - Introduction To Computer Networks Module
14 pages
Raster Scan and Random Scan
No ratings yet
Raster Scan and Random Scan
4 pages
Logiscenter Pidion HM50
No ratings yet
Logiscenter Pidion HM50
3 pages
CRYPTOGRAPHY
No ratings yet
CRYPTOGRAPHY
10 pages
123 Kamble Aditya AJP
No ratings yet
123 Kamble Aditya AJP
20 pages
Data Communications and Networking 1 Q2
No ratings yet
Data Communications and Networking 1 Q2
3 pages
Data Structures Lecture No. 02: Reading Material
No ratings yet
Data Structures Lecture No. 02: Reading Material
8 pages
2022 Maturity-Assessment-Tool Detailed
No ratings yet
2022 Maturity-Assessment-Tool Detailed
99 pages
CC-303 Computer Networks - 2021
No ratings yet
CC-303 Computer Networks - 2021
2 pages
STM422 RS232 To RS485-RS422 Interface Converter User Guide
No ratings yet
STM422 RS232 To RS485-RS422 Interface Converter User Guide
2 pages
Python - GUI, Numpy and Pandas - Quizizz
No ratings yet
Python - GUI, Numpy and Pandas - Quizizz
10 pages
Skype For Business and Lync Troubleshoot
No ratings yet
Skype For Business and Lync Troubleshoot
130 pages
MERN_Chat_App_Synopsis
No ratings yet
MERN_Chat_App_Synopsis
10 pages