Building Scalable Web Architectures: Aaron Bannert
Building Scalable Web Architectures: Aaron Bannert
Aaron Bannert
[email protected] / [email protected]
QuickTime™ and a
http://www.codemass.com/~aaron/presentations/ TIFF (Uncompressed)
are needed to see this decomp
picture
apachecon2005/ac2005scalablewebarch.ppt
Goal
1. LAMP Overview
2. LAMP Features
3. Performance
b. Avoiding Bottlenecks
LAMP Overview
Architecture
LAMP
Linux
Apache
MySQL
PHP (Perl?)
The Big Picture
External Caching Tier
External Caching Tier
What is this?
Squid
Apache’s mod_proxy
Commercial HTTP Accelerator
External Caching Tier
Flushes Connections
Useful for modem users, frees up web tier
Hardware Requirements
Lots of Memory
Fast Network
Moderate to little CPU
Moderate Disk Capacity
Room for cache, logs, etc… (disks are cheap)
One slow disk is OK
Other Questions
What to cache?
How much to cache?
Where to cache (internal vs. external)?
Web Serving Tier
Web Serving Tier
What is this?
Apache
thttpd
Netscape
Web Serving Tier
Hardware Requirements
Lots and lots of Memory
Memory is main bottleneck in web serving
Memory determines max number of users
Fast Network
CPU depends on usage
Dynamic content needs CPU
Static file serving requires very little CPU
Cheap slow disk, enough to hold your content
Web Serving Tier
Choices
How much dynamic content?
When to offload dynamic processing?
When to offload database operations?
When to add more web servers?
Application Server Tier
Application Server Tier
What does it do?
Dynamic Page Processing
JSP
Servlets
Internal Services
Eg. Search, Shopping Cart, Credit Card Processing
Application Server Tier
• How does it work?
1. Web Tier generates the request using
HTTP (aka “REST”, sortof)
RPC/Corba
Java RMI
XMLRPC/Soap
(or something homebrewed)
2. App Server processes request and responds
Application Server Tier
Caveats
Decoupling of services is GOOD
Manage Complexity using well-defined APIs
Don’t decouple for scaling, change your algorithms!
Remote Calling overhead can be expensive
Marshaling of data
Sockets, net latency, throughput constraints…
XML, Soap, XMLRPC, yuck (don’t scale well)
Better to use Java’s RMI, good old RPC or even Corba
Application Server Tier
More Caveats
Remote Calling introduces new failure scenarios
Hardware Requirements
Lots and Lots and Lots of Memory
App Servers are very memory hungry
Java was hungry to being with
Consider going to 64bit for larger memory-space
Disk depends on application, typically minimal needed
FAST CPU required, and lots of them
(This will be an expensive machine.)
Database Tier
Database Tier
Available DB Products
Free/Open Source DBs
PostgreSQL MySQL
GNU DBM
SQLite
Ingres
mSQL
SQLite
Berkeley DB
Commercial
Oracle
MS SQL
IBM DB2
Sybase
SleepyCat
Database Tier
What does it do?
Data Storage and Retrieval
Data Aggregation and Computation
Sorting
Filtering
ACID properties
(Atomic, Consistent, Isolated, Durable)
Database Tier
Choices
How much logic to place inside the DB?
Use Connection Pooling?
Data Partitioning?
Spreading a dataset across multiple logical database
“slices” in order to achieve better performance.
Database Tier
Hardware Requirements
Entirely dependent upon application.
Likely to be your most expensive machine(s).
Tons of Memory
Spindles galore
RAID is useful (in software or hardware)
Reliability usually trumps Speed
• RAID levels 0, 5, 1+0, and 5+0 are useful
CPU also important
Dual power supplies
Dual Network
Internal Cache Tier
Internal Cache Tier
What is this?
Object Cache
What Applications?
Memcache
Scales Horizontally
Internal Cache Tier
Hardware Requirements
Lots of Memory
Note that 32bit processes are typically limited to 2GB
of RAM
Little
or no disk
Moderate to low CPU
Fast Network
Misc. Services (DNS, Mail, etc…)
Misc. Services (DNS, Mail, etc…)
Important Points
Always have an offsite NS slave
Always have an onsite NS slave
Minimize network latency
Don’t use NAT, load balancers, etc…
Misc. Services: Time Synchronization
Big Brother
Orcalator
Ganglia
Fault Notification
The Glue
•Routers
•Switches
•Firewalls
•Load Balancers
Routers and Switches
Expensive
Complex
Crucial Piece of the System
Hints
Use GigE if you can
Jumbo Frames are GOOD
VLans to manage complexity
LACP (802.3ad) for failover/redundancy
Load Balancers
What services to balance?
HTTP Caches and Servers, App Servers, DB
Slaves
What NOT to balance?
DNS
LDAP
NIS
Memcache
Spread
Anything with it’s own built-in balancing
Message Busses
What is out there?
Spread
JMS
MQSeries
Tibco Rendezvous
Usability
Do your engineers like it?
Cost
Hardware Requirements
(you don’t need a commercial Unix anymore)
Features to look for
Multi-processor Support
64bit Capable
Mature Thread Support
Vibrant User Community
Support for your devices
The Age of LAMP
What does LAMP provide?
Scalability
Abundance of talent
Modular Components
Public APIs
Open Architecture
Vendor Neutral
Many options at all levels
Extendable
Free/Open Source Licensing
Right to Use
Right to Inspect
Right to Improve
Plugins
Some Free
Some Commercial
Price
Speed
Quality
For LAMP?
Concurrency causes
slowdowns
Latency suffers
Eliminate bottlenecks
3. External Caching
Load Balancers
Manage Complexity
Predict Growth
Commodity Parts
Manage Complexity
Decouple internal services
Servicesscale independently
Independent maintenance
Well-defined APIs
Facilitates
service decoupling
Scales your engineering efforts
What is a Stateless System?
Benefits?
Allows Better Caching
Scales Horizontally
Designing Stateless Systems
Decouple session state from web server
Store session state in a DB
Careful: may just move bottleneck to DB tier
Use a distributed internal cache
Memcached
Reduces pressure on session database
Example: Scaling your User DB
Assume you have a user-centric system
Eg. User identity info, subscriptions, etc…
Concurrent connections
System load
…
Machine-sized Solutions
Design for last year’s hardware
(it’s cheaper)
Leaves room for your software to grow
Hardware will get faster
And your systems will get busier
Use Commodity Parts
Standardize Hardware
(Open Source!)
Avoid Fads
THE END
Thank You