100% found this document useful (2 votes)
203 views

Web Scalability - Part - 2

Web servers can scale by adding machines with languages like Ruby, PHP, and Python not being the bottleneck. Serving media requires a "mini-cluster" to scale and ensure high availability. Thumbnails may represent a bottleneck due to many small file accesses stressing disks. Replicating databases allows for reads from replicas but does not help with write scaling and replicas can lag behind masters. Solutions include database sharding, splitting replicas into pools, and optimizing RAID configurations.

Uploaded by

fizo
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
203 views

Web Scalability - Part - 2

Web servers can scale by adding machines with languages like Ruby, PHP, and Python not being the bottleneck. Serving media requires a "mini-cluster" to scale and ensure high availability. Thumbnails may represent a bottleneck due to many small file accesses stressing disks. Replicating databases allows for reads from replicas but does not help with write scaling and replicas can lag behind masters. Solutions include database sharding, splitting replicas into pools, and optimizing RAID configurations.

Uploaded by

fizo
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Supercourse Scalability

Web Servers
Web Servers
• Linux.
• Can usually scale by adding
machines.
• Ruby, Python, PHP, Groovy.
– Web Code not the bottleneck.
– Spend time waiting RPCs.
– Development speed critical.
The CPU is not the bottleneck.
Any language should be fast
enough..just, it should be dynamic
Serving Media
• Each piece of Media should be
hosted by a “mini-cluster”.
– Scalability.
– More than 1 HDD to serve the media.
– Online Backup.
• Apache  lighttpd (high load,
context switching, context switching)
• Switch from single process to multi-
process.
Serving Media
C
DNs

The most Popular content

SuperCour
se
Serv1
SuperCour
Moderately played se
Intern Serv2
et SuperCour
se
Serv3
SuperCour
se
Serv4
Serving Thumbnails
• Thumbnails are scary, they may
represent a bottleneck.
– Disk Seeks.
– Many small objects.
– High number of requests/sec
Serving Thumbnails
• Limit on the # of files in a
directory.(ext3)
• Squid..better to use Varnish.(reverse
Proxy)
• Apache may not be sufficient for disk
reads. (load, too many disk reads)
Thumbnails: lighttpd/aio
• Lighttpd is single threaded…

Main Thread

Worker Disk Disk


Thread 1 Read Read

Worker Disk
Thread 2 Read
Thumbnails
• There will may be bottleneck with
accessing small files (disks reads
bottlenecks).
Thumbnails: BT
• Google uses a system called BTFE in
youtube, google video, imagesearch..
– based on Google Bigtable.
– Avoids small files problem.
– Various forms of caching.(multiple
cache layers based on location..etc)
Databases
• Stores metadata (users, bookmarks,
comments, etc…)
• Database performance degrades
with disk reads.
• Pay little attention to “swap” in the
linux kernel, as the OS may swap the
database engine in/out.
DB Optimizations
• Query Optimizations
• Batch Jobs
• memcached
• App server caching.
• Pre-calculation of common queries.
DB Replication

Master All writes go


(mostly here
Write)

Sql Replication

Write Write Write

Read Read Read


DB Replication: Too many
writes

Master
(mostly
Write)

Sql Replication

Write Write Write

Read Read Read

Replication doesn’t help writes


Replica Lag
• Replication is asynchronous
• Replicas can fall behind master
database, serve old data.
• MySQL Replication.
Replication: Master
Client Client Client
Thread1 Thread2 Thread3
update1 update2 update3

Master
Databa
se

Multiple threads = concurrency on multi-


disk, multi-CPU systems
Replication: Replica
Replication thread
Update1
Long update2(blocks update 3 until finished!)
update3

Replica
Databa
se

Single thread = limited parallelism,


higher likelihood of slow query stalling
updates
Replication Thread
Unhealthy
• Normal replication Thread:
Update row 100(cache miss)

Update row 2(cache hit)

Update row 8(cache miss)

Update row 40(cache miss)

Update row 2(cache hit)

Cache misses require slow disk I/Os, causing a reduction in replication speed.
DB (Abstract view)
• DB updates involve two steps:
– Reading the affected DB pages.
– Applying the changes.

• Prefetch the pages needed by step


#1.(cache primer by reading the SQL
buffer for the affected rows)
• Difficult solution that will not solve all
the replicas problems
Summarize Replicas
• Too many read replicas
• Writes start crowding out reads
• Replication lag
• Extraordinary measures needed to
stay alive…
Database Pools
• Split replica databases into two
pools:
– Media watch.
• Most visited, media displayed data..etc
– General
• Lower priority than media.
• Less efficient queries.
• Less popular.
• Replica is still lagging but less than before.
DB RAID Tweaking
• Monolithic RAID 10 volume(10 disks)
• Linux sees only 1 volume, so it
doesn’t schedule too many parallel
disk I/Os.
DB RAID
• Split the to 5 volumes each one has 2
disks.
• Linux will see 5 volumes instead of 1
logical volume allowing it more
aggressively schedule disk I/O.
DB Partitions
• Partition the monolithic DB into
multiple shards.
• We should try to balance the traffic
on these shards.
• This should be done by monitoring
active users and move them across
different shards.
Replace DB by MapReduce
• Think to replace the traditional DBs
by MapReduce.
• MySQL doesn’t allow parallel queries.
• MapReduce Spread computational
power across many other machines.

You might also like