SlideShare a Scribd company logo
System Design Handbook
System Design Basics ①
1)
Try to break the problem into simpler
modules ( Top down approach)
2) T
alk about the trade -
offs
( No solution is perfect)
calculate the impact on system based on
all the constraints and the end test cases .
←
Focus on interviewer 's
heproblem
] intentions .
T
Ask
Abstract
]
questions
( constraints &
Functionality
finding
Requirements) bottlenecks
idudas
System Design Basics ccontd ) ②
D Architectural pieces / resources available
2) How these resources work together
3) Utilization & Tradeoffs
-
consistent Hashing
-
CAP Theorem ✓
-
Load balancing ✓
-
queues
f- caching -
-
Replication
-
59L vs No -
SQL
I
-
Indexes ✓
-
Proxies
1 -
Data Partitioning
✓
Load Balancing
③
( Distributed system)
Types of distribution -
f
Random
Round -
robin
<
Random ( weights for memory
&
CPU
cycles)
To utilize full scalability &
redundancy ,
add 3 LB
D User ¥ web server
2) Web server ¥
App server 1 Cache Server
( Internal platform)
3) Internal platform DB .
#W
Client LB
er
DB
T ,, LB
Smart clients
Takes a pool of service hosts & balances load.
→ detects hosts that are not responsive
→
recovered hosts
→
addition of new hosts
Load balancing functionality to DB (cache. Service
*
Attractive solution for developers
( small scale systems)
As system grows
→ LBS ( standalone servers
)
Hardware load Balancers :
Expensive but high performance.
e.
g . Citrix Netscaler
Not trivial to configure.
Large companies tend to avoid this config .
or use it as 1st point or contact to their
system to serve user requests
&
Intra network uses smart clients / hybrid
solution → ( Next page) for
load balancing traffic .
Software Load Balancers
'
No pain of creation of smart client
No cost of purchasing dedicated hardware
[ hybrid approach
HA Proxy OSS Load balancer
-4
1)
Running on client machine
'
2
( locally bound port)
e -
g
.
local host : 9000
I
F-
managed by HA Proxy
( with efficient management
of requests on the port )
2) Running on intermediate server : Proxies
running beth
HA Proxy
[
Manages health checks
dirt server side components
removal & addition of machines
balances requests alc pools .
Wortdof
Databases
=
S9Lvs.NoS÷
iaa
1) structured D Unstructured .
2) Predefined schema 2) distributed
3) Data in rows & columns 3)
dynamic schema
Row One Entity Into
column Separate data points
mysar
¥'s::L:L: stores
Oracle
9sa¥fl wait:p:
'
:3;
DB
Postgres
MariaDB
an E o
e
r
÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷
O
e
ai
s I E
E cs O
S O
th s I 3 is
J
Os
e
to 6
of = G
IT u
E
s
O
U
'
E s O S
-
I
g O
f÷÷÷¥÷÷÷÷÷÷÷÷÷.
Et-
s o
U
te
O
G O
e e O
:
tr
- I
'
J S T
O
g
e
I E
E O or I
is
E E
ET
o o
① ± . ÷EE
or or o J
-
E e E E
s
is 3 Er er .
.
I 8 Ess E -3 o
o -
G o
OS u
c S s
d d
- E
E
OE s
w t
o E o o J E
=
O E E
G -
r d Id O is 8 e
E
t
E
°
i n u EE s E -
E -
E O
E s -
d s
gu -
so
§
n
E
d s
f O s . - c O G
e
u f O
d O O - 88
U
I 0
or 6 by us is •
u
w S S s s
o
- W s e
j d u
E
U
o
g
O s a ou ou N & si u
O
or 883
A
88 it's too >
u
s
I E
ch t g - o o
O
s 't ⑤
I
it
Cs cs T g J as a
Notre u
E
O o
-
v
. I d g.
E n
S
E KE u e I
o
O
→
of E s
o s 8 s I I 8 of to
E Z t
-
-
n y
U ou u @ at so .
-
In
O Es O s
6
w o @ Ed O t T
i N
T S e e E O
ng O w . - u
& .
o -
E U -
o E on s a •
R g
S w O -
z
-
O or O
U
N
b d
s D
J O s cry
to Nos y
U G - u
-
-
t O
u z > so I
'
E O
og
u
O si
w Dw or • ← we
t w
O
F
acs
n
-
o
o 8 3 u
f o
S
- T I Z o
's
- E u 06 I d.E es
°
6
§ 03
O
T
w
E s
if÷÷÷÷f÷÷÷i÷÷÷:ff÷÷:÷f÷÷i÷÷÷
G O z
S T
s 89 f-
od '
I O
U
G o T
u Es E o 8- or
#
O E D
s @ u w
u o
J as
6
'
I t
O L
O t u
'
Es
D O 6
g
y
U
d 62
g-
U
w u o
g
N s or
y es or
u
O >
u
d
U s
-
- d E nd
o b Eire E # E E
=
T I n
- ooo EEE -
o
'
E
O o
-
I •
as u
a
1- o
's x G
, +
as
I o
'
88 go.IT IT IT IT
-
w
s
q El if ¥4
. - e d
J o
I 0 8 u
s u g
Ev or
on us
or
Reasonstouses.cl#BJ
1) You need to ensure ACID compliance :
ACID compliance
Reduces anomalies
Protects integrity of the database .
for
many E -
commerce & financial app
"
→
ACID compliant DB
is the first choice .
2) Your data is structured &
unchanging .
If your business is not experiencing
rapid growth or sudden changes
→ No requirements of more servers
→ data is consistent
then there's no reason to use system design
to support variety of data &
high traffic .
Reasonstouse.NO#IB
When all other components of system are fast
→
querying &
searching for data bottleneck .
NoSQL prevent data from being bottleneck .
Big data large success for NoSQL.
1) To store large volumes of data C little Ino structure)
No limit on type of data.
Document DB Stores all data in one place
( No need of type of data)
2) Using cloud & storage to the fullest .
Excellent cost saving solution .
( Easy spread of data
across multiple servers to scale up
)
OR commodity hlw on site ( affordable , smaller )
No headache or additional Stw
& NoSQL DBS like Cassander designed to scale
across multiple data centers out of the box.
3) Useful for rapid 1 agile development.
If you're making quick iterations on schema
SQL will slow you down .
Achieved by
CAP-heore€
tenyC All nodes see same data
updating several nodes .
at same time)
before allowing
g)
reads
E
"
"
"
"
e.
%%%,
Not
g
pareieiontdera£
Availability
[cassandra . Couch DB]
Hr Hr
Every request gets System continues to work
response ( success ( failure) despite message loss ( partial
Achieved by replicating Failure .
data across different servers ( can sustain any amount
of network failure without
- resulting in failure of entire
Data is sufficiently replicated network )
across combination of nodes /
networks to keep the system up .
.in:6#eo:::i:::::ia::i:n:::ans'
We cannot build a datastore which is :
D
continually available
2)
sequentially consistent
3)
partition failure tolerant .
Because ,
To be consistent all nodes should see the same
set of updates in the same order
But if network suffers partition,
update in one partition might not make it to
other partitions
↳ client reads data from out-of-date partition
After having read from up-to-date partition .
Solutions stop serving requests from out -
of -
date
partition.
↳ service is no longer
100% available .
Redundancy&ReplicationJ
Duplication of critical data & services
↳
increasing reliability of system .
For critical services & data ensure that multiple
copies 1 versions are running simultaneously on different
servers 1 databases .
Secure against single node failures .
Provides backups if needed in crisis .
Deter Dow
Primary server secondary server
& Data
2
- - a
Ig Replication 2
Active data Mirrored data
# Service
Redundancy : shared -
nothing architecture.
Every node independent. No central service managing state .
]
More resilient ← New servers
←
Helps in
[ to failures addition without scalability
Nosing special conditions
Caching
Load balancing Scales horizontally
caching :
Locality of reference principle
I Used in almost every layer of
computing .
I Application Server cache :
Placing a cache directly on a
request layer node.
↳ Local storage of response
Requests
in€.÷.
miss
response data
# Caches on one Request layer made
T
catedy
✓
Memory (very fast) Node 's local disk
( faster than going to network storage)
# #
Bottleneck :
If LB distributes requests randomly
↳ same request different nodes
our:c!m£ More
d
by
D Global caches
2) Distributed caches
Distributed
Gimme
-
Divided
using consistent
hashing function
③ ⑤gµeautts.-€#
# #
Easy to increase cache space by adding more hordes
##
Disadvantage :
Resolving a missing node
staring multiple copies of I can be handled
by
data on
different hordes
likes making it more complicated .
# # Even
if node disappears
request can pull data from Origin.
flabellate
#
Single cache space for all the hacks .
↳
Adding a cache source I file store ( faster than original store
)
#
Difficult to manage if no
af clients I request increases -
effective if
Y
fixed dataset that needs to be cached
2)
special Hlw fast Ho .
# Forms of global cache :
GEE { %fbtae.mn?hdEtahhYm
Database
database
contains hot data at
Global
cache
%gz { Database
App
"
logic understands the eviction strategy that spoils
better than cache .
CDN : content Distribution network
4- Cache store for sites that saves
large amount
of static media .
if not
available
Request CDN -
Baek- End
if a
- server
available( L
local ( static Media)
storage
Lf the site isn't large enough to have its own CDN
4µA
transition
Some static media using separate subdomain
- ( static .
yaursuuice.com
using lightweight Nginse serves
↳ entrance DNS from your sauce
to a CDN later
Cache Invalidation
# Cached data needs to be coherent with the database
Lf data in DB modified invalidate the cached data .
# 3 schemes :
=
I Write -
through cache :
cache
Data is written
same time in
Data
DB hath cache & DB .
+
Complete data consistency C cache = DB)
+
Fault tolerance in case of failure Club data loss
)
-
high latency in writes 2 write operations
2)
Write around cache
Cache
Data
DB
+ No cache flooding foe writes
-
read request for newly written data miss
higher latency
d
B) Write back cache :
cache DB
Data
after some
£ interval as under
client
some sepciifird conditions
data is written to DB
from cache
+ law latency &
high throughput bae write -
intensive app
"
-
Data loss TT ( only one
copy
in caches
# Cache Eviction Policies
D FIFO
2) LIFO Ae FILO
3) LRV
4) MRU
5) LF U
Random Replacement
S harding 11 Data Partitioning
# Data
Partitioning :
splitting up DD I table across multiple
machines
manageability .
performance , availability & LB
**
After a certain scale paint ,
it is cheaper and more feasible
to scale
horizontally by adding money instead of
vertical
scaling by adding beefiness
# Methods of Partitioning :
1)
Horizontal Partitioning
:
Different rows into diff.
tables
Range based shading
e.
g .
staring locations by zip different
Table 1 :
Lips with L 100000
ranges in
Table 2 :
Lips with 7 Loo ooo
different tables
and so on
**
come if the value of the range not chosen
carefully
leads to unbalanced servers
e. g . Table I can have more data than table 2 .
Vertical Partitioning
# Feature wise distribution of data
↳ in
different servers .
e.
g.
Instagram - DB sauce 1 :
user info
DB sauce 2 : followers
DB server 3 :
photos
* A
straightforward to implement
* A
lane impact on app .
- -
if app
→ additional growth
need to partition feature specific DB across various sources
( e -
g. it would not be possible for a
single sewer to handle
all metadata queries for Lo billion photos by 140 mill.
users
Directory based partitioning
A
loosely coupled approach to work around issues
mentioned in above two partitioning .
** Create lookup service current partitioning scheme
& abstracts it
away from the DB access code.
Mapping l tuple key → D8 sauce)
Easy to add DD towers or
change partitioning scheme .
Partitioning Criteria
D
key or Hash based partitioning :
Kay atte-
af Hash
function →
Partition
the data number
#
Effectively fines the total number
of sauces 1partitions
So
if we add new source I partition T
o
change in hash function
downtime because
of
d
redistribution
↳
Solution :
consistent Hashing
2) List Partitioning : Each partition is
assigned a list of
values .
Nuo → hookup
record for
→
stare the record
key ( partition based on the key)
3) Round Robin
Partitioning :
uniform data distribution
with '
n .
partitions
the
'
is tuple is assigned to partition
(i mad n)
4
Composite Partitioning :
combination af above
partitioning schemes
flashing t List consistent Hashing
Hr
Hash reduces the
key space to a
size that can be listed .
# Common Problems
of Shouting :
Iharded DB :
Entree constraints on the diff .
operations
Hr
operations -
across multiple tables or
multiple rains in the same table 7
no
longer running
in
single severe.
" Jains A Denoumalizatiom :
Jains on tables on
single sauce straightforward.
* not feasible to
perform joins on shrouded tables
↳ Less efficient C data needs to be compiled from
multiple servers)
# Workaround Denarmalip the DB
so that the queries that
previously read.
jains can be
performed
from a
single table .
( coins Perils of denavmalizatiom
↳
data inconsistency
2)
Referential integrity
:
Foreign keys om shrouded D8
↳
difficult
*
Mast of the RDBMS does not support foreign keys on
stranded DB .
#
If app
"
-
demands referential integrity om shrouded DB
↳
enforce it in app
"
code C SOL
jobs to
clean up dangling references)
3)
Rebalancing :
Reasons to change sharking scheme :
a) horn -
uniform distribution C data wise )
b) non -
uniform laced balancing C request wise)
Workaround: Y add new DB
2) rebalance
↳
change in partitioning scheme
↳ data monument
{ ↳
downtime
We can use
directory -
based
partitioning
↳
highly complex
↳
single paint of failure
(
lookup service 1 table)
Indexes
Well known because -
of databases .
Improves speed of retrieval
-
Increased storage auerhead
-
Shauna writes
↳ write the data
↳
Update the index
Can be created
using one or more columns
*
Rapid random lookups
&
efficient access of ordered records .
# Data structure
column → Painter to whale raw
→ Create
different views of the -
same data .
↳
very good for filtering /
sorting of large data sets .
↳ no need to create additional copies.
# Used foe datasets ( TB in size) & small
payload ( KB)
I
spied over several
physical devices → We need some
way to find the correct
physical location i. e. Indexes
useful under high load situations
Peonies
-
if we have limitcdcaehing
↳ batches several requests into one
client
Backend
>
>
MY > sauce
>
Peony
client
>
Source
^
a
filters requests
-
log requests
transform
-
add / remain headers
encryption / decryption
frequently compression
used resources request co -
ordination
( request traffic optimization
T
we can also use
←
Collapse same data access
spatial locality request into one.
↳
collapsing requests collapsed forwarding
for data that is
spatially
Ipsf
minimize reads from -
origin .
Queues
Effectively manages requests in
large
-
scale distributed system
→
In small systems
→ writes are
fast .
→
In complex systems
→
high incoming load
4- individual writes take more time
*
To achieve high performance & availability
↳
system needs to be
asynchronous
↳
Queue
#
Synchronous behaviour →
degrades performance
d
ye
Load balancing
difficult for fair &
balanced distribution
server
C
, T,
-
T2
r
C,
-13
T ,
gag { ,
T2
↳
Queue
Cu
# Queues :
asynchronous communication protocol
↳ client sends task
↳
gets ACK
from queue lecccipt)
I
serves as
reference
for the results in
future
[ client continues its work .
# Limit on the
sispafeeguest
& number of requests in
queue
# Queue :
Provides fault -
tolerance
[
↳
protection from service
outage /
failure
highly robust
[
↳
retry failed service request
Enforces Quality of Service
guarantee
L Does NOT expose clients to outages)
# Queues :
distributed communication
↳
Open source implementations
↳ Rabbitma ,
Zoeoma ,
Active MQ ,
BeanstalkD .
Consistent Hashing
# Distributed Hash Table
index =
hash -
function C
key)
# Suppose we're
designing distributed
caching system
with n cache servers
↳ hash .
function (
key % n )
Drawbacks :
1) NOT
horizontally scalable
↳ addition of new server results in
↳
need to
change all
existing mapping.
( downtime of system)
2) NOT load balanced
l because -
af non -
uniform distribution of data )
1-
Some caches : hat & saturated
Other caches :
idk &
empty
How to tackle about problems ?
Consistent flashing
What is consistent Hashing ?
→
Very useful strategy for distributed caching & outs .
→
minimizes reorganization in
scaling up / dawn.
→
only kin keys needs to be remapped.
k total number of keys
n number of servers
How it works ?
Typical hash function suppose outputs in [ 0 .
2567
In consistent hashing ,
imagine all
of these integers are placed on a
ring .
255 0
254
,
• •
of
253
,
•
2
±
"
& we have 3 servers : A ,
B & C .
1)
Given a list
of servers ,
hash them to integers in the
range.
255 0
C A
B
2)
Map key to a serum :
a) Hash it to
single integer
b) Mane CLK wise until
you find Laura
c)
map key to that server .
255 0
hL key -
1)
of A
"
h (
key - 2)
'
B
Adding a new server
'
.
will result in
morning the
-
key -2
'
to 'D
255 0
hL key -
1)
C A
"
a
D
h (
key - 2)
'
B
Removing server IA
'
,
will result in
morning the
-
key-1
'
to II
255 O
h ( key -
1)
①
a
D
h (
key - 2)
'
B •
Consider real world scenario
data →
randomly distributed
↳ unbalanced cactus .
How to handle this issue ?
Virtual Replicas
Instead -
of mapping each node to a
single paint
we
map it to multiply paints .
↳ ( more number of replicas
↳ more
equal distribution
↳
good load
balancing)
255 0
D C
•
h ( key -
1)
A
a-
C
B
B
C
AD
D
h [ Key -
2)
^
A
B
Long -
Palling vs tikbsoekrts us Serves -
Sent Events
'
↳
Client -
Senior Communication Protocols
# HTTP Protocol :
request
NJ prepare Remorse
client
>
Serum
<
.
Response
# AJAX Patting :
Clients repeatedly palls servers for data
similar to HTTP protocol
↳
requests sent to screen at regular intervals (0.5sec
)
Drawbacks :

Client keeps asking the source now data
↳ Lot
of
uspomsgy.au
'
empty
'
↳ HTTP Overhead.
.
Request .
)
4
Response
Request.
>
Client a
response
sooner

.
Request >
<
Response
# HTTP Long Patting :
•
Hanging GET '
Sauce does NOT send empty response .
Pushes response to clients only when new data is available
☐
Client makes HTTP
Request 4- waits for the response .
2) Server delays response until update is available
or until time-out occurs.
3) When update → server sends full response.
4) Client sends now
long-
poll request
a)
immediately after receiving response
d)
after a pause to allow acceptable latency period
5) Each request has timeout.
Client needs to reconnect periodically due to timeouts
LP Request .
7
<
My
full Response
LP
Request.
>
Client <
.
full Response
Showa
LP Request >
a
full Response
Wet Sockets
→
duplex communication channel over
single TCP connection .
→ Provides
'
persistent communication
'
( client a serum can send data at anytime]
→
bidirectional communication in always open channel .
pep
socket
Handshake
Request
Handshake Success
Response
<
-
:
is i.
Client <
'
.
Source
.
Communication
}
channel
→ Lower overheads
→
Real time data transfer
Server -
sent Events ( SSE)
client establishes persistent & long-
term connection with sauna
server uses this connection to send data to client
* *
If client wants to send data to server
↳
Requires another technology / protocol .
data request
using regular HTTP
v
'
ing
=
i.
Client < .
Source
•
Always -
open unidirectional
<
communication
< .
channel
-
responses whenever now data available
→
best when we need real -
time data from soever to client
OR server is
generating data in a loop &
will be sending multiple events to the client .

More Related Content

PDF
System Design Basics by Pratyush Majumdar
PDF
Dealing with Enterprise Level Data
PPT
SQL or NoSQL, that is the question!
PPT
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
PDF
Scalability Considerations
PPTX
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
PPS
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
PPTX
Scalable Web Architecture and Distributed Systems
System Design Basics by Pratyush Majumdar
Dealing with Enterprise Level Data
SQL or NoSQL, that is the question!
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Scalability Considerations
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architecture and Distributed Systems

Similar to System design handwritten notes guidance (20)

PDF
Tech Winter Break @gdgkiit | System Design Essentials
PDF
Scale from zero to millions of users.pdf
PPS
Web20expo Scalable Web Arch
PPS
Web20expo Scalable Web Arch
PPS
Web20expo Scalable Web Arch
PPTX
Scaling your website
PDF
Scalable, good, cheap
PPS
Scalable Web Architectures - Common Patterns & Approaches
PPS
Scalable Web Arch
ODP
Front Range PHP NoSQL Databases
PPTX
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
PDF
Distributed Systems: scalability and high availability
PPTX
Handling Data in Mega Scale Systems
PDF
What every developer should know about database scalability, PyCon 2010
PPTX
Clustered PHP - DC PHP 2009
PDF
Why Distributed Databases?
PPTX
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
PPT
UnConference for Georgia Southern Computer Science March 31, 2015
PDF
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
PDF
Everything you always wanted to know about highly available distributed datab...
Tech Winter Break @gdgkiit | System Design Essentials
Scale from zero to millions of users.pdf
Web20expo Scalable Web Arch
Web20expo Scalable Web Arch
Web20expo Scalable Web Arch
Scaling your website
Scalable, good, cheap
Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Arch
Front Range PHP NoSQL Databases
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Distributed Systems: scalability and high availability
Handling Data in Mega Scale Systems
What every developer should know about database scalability, PyCon 2010
Clustered PHP - DC PHP 2009
Why Distributed Databases?
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
UnConference for Georgia Southern Computer Science March 31, 2015
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Everything you always wanted to know about highly available distributed datab...
Ad

More from Shabista Imam (11)

PDF
Structured Programming with C++ :: Kjell Backman
PDF
Introduction to Computer Networks: Peter L Dordal
PDF
Complete University of Calculus :: 2nd edition
PDF
Complete guidance book of Asp.Net Web API
PDF
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
PDF
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
PDF
Tally.ERP 9 at a Glance.book - Tally Solutions .pdf
PDF
special_edition_using_visual_foxpro_6.pdf
PDF
Complete WordPress Programming Guidance Book
PDF
Visual basic
PPTX
Introduction to c programming,
Structured Programming with C++ :: Kjell Backman
Introduction to Computer Networks: Peter L Dordal
Complete University of Calculus :: 2nd edition
Complete guidance book of Asp.Net Web API
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
Tally.ERP 9 at a Glance.book - Tally Solutions .pdf
special_edition_using_visual_foxpro_6.pdf
Complete WordPress Programming Guidance Book
Visual basic
Introduction to c programming,
Ad

Recently uploaded (20)

PDF
A Framework for Securing Personal Data Shared by Users on the Digital Platforms
PDF
International Journal of Information Technology Convergence and Services (IJI...
PPTX
anatomy of limbus and anterior chamber .pptx
PDF
flutter Launcher Icons, Splash Screens & Fonts
PPTX
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
PPT
Chapter 6 Design in software Engineeing.ppt
PPTX
ANIMAL INTERVENTION WARNING SYSTEM (4).pptx
PDF
Top 10 read articles In Managing Information Technology.pdf
PPTX
Ship’s Structural Components.pptx 7.7 Mb
PDF
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
PPTX
TE-AI-Unit VI notes using planning model
PPTX
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
PPTX
Practice Questions on recent development part 1.pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
Simulation of electric circuit laws using tinkercad.pptx
PPTX
metal cuttingmechancial metalcutting.pptx
PDF
dse_final_merit_2025_26 gtgfffffcjjjuuyy
PDF
algorithms-16-00088-v2hghjjnjnhhhnnjhj.pdf
A Framework for Securing Personal Data Shared by Users on the Digital Platforms
International Journal of Information Technology Convergence and Services (IJI...
anatomy of limbus and anterior chamber .pptx
flutter Launcher Icons, Splash Screens & Fonts
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
Chapter 6 Design in software Engineeing.ppt
ANIMAL INTERVENTION WARNING SYSTEM (4).pptx
Top 10 read articles In Managing Information Technology.pdf
Ship’s Structural Components.pptx 7.7 Mb
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
TE-AI-Unit VI notes using planning model
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
Practice Questions on recent development part 1.pptx
Lesson 3_Tessellation.pptx finite Mathematics
Simulation of electric circuit laws using tinkercad.pptx
metal cuttingmechancial metalcutting.pptx
dse_final_merit_2025_26 gtgfffffcjjjuuyy
algorithms-16-00088-v2hghjjnjnhhhnnjhj.pdf

System design handwritten notes guidance

  • 2. System Design Basics ① 1) Try to break the problem into simpler modules ( Top down approach) 2) T alk about the trade - offs ( No solution is perfect) calculate the impact on system based on all the constraints and the end test cases . ← Focus on interviewer 's heproblem ] intentions . T Ask Abstract ] questions ( constraints & Functionality finding Requirements) bottlenecks idudas
  • 3. System Design Basics ccontd ) ② D Architectural pieces / resources available 2) How these resources work together 3) Utilization & Tradeoffs - consistent Hashing - CAP Theorem ✓ - Load balancing ✓ - queues f- caching - - Replication - 59L vs No - SQL I - Indexes ✓ - Proxies 1 - Data Partitioning ✓
  • 4. Load Balancing ③ ( Distributed system) Types of distribution - f Random Round - robin < Random ( weights for memory & CPU cycles) To utilize full scalability & redundancy , add 3 LB D User ¥ web server 2) Web server ¥ App server 1 Cache Server ( Internal platform) 3) Internal platform DB . #W Client LB er DB T ,, LB
  • 5. Smart clients Takes a pool of service hosts & balances load. → detects hosts that are not responsive → recovered hosts → addition of new hosts Load balancing functionality to DB (cache. Service * Attractive solution for developers ( small scale systems) As system grows → LBS ( standalone servers ) Hardware load Balancers : Expensive but high performance. e. g . Citrix Netscaler Not trivial to configure. Large companies tend to avoid this config . or use it as 1st point or contact to their system to serve user requests & Intra network uses smart clients / hybrid solution → ( Next page) for load balancing traffic .
  • 6. Software Load Balancers ' No pain of creation of smart client No cost of purchasing dedicated hardware [ hybrid approach HA Proxy OSS Load balancer -4 1) Running on client machine ' 2 ( locally bound port) e - g . local host : 9000 I F- managed by HA Proxy ( with efficient management of requests on the port ) 2) Running on intermediate server : Proxies running beth HA Proxy [ Manages health checks dirt server side components removal & addition of machines balances requests alc pools .
  • 7. Wortdof Databases = S9Lvs.NoS÷ iaa 1) structured D Unstructured . 2) Predefined schema 2) distributed 3) Data in rows & columns 3) dynamic schema Row One Entity Into column Separate data points mysar ¥'s::L:L: stores Oracle 9sa¥fl wait:p: ' :3; DB Postgres MariaDB
  • 8. an E o e r ÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷ O e ai s I E E cs O S O th s I 3 is J Os e to 6 of = G IT u E s O U ' E s O S - I g O f÷÷÷¥÷÷÷÷÷÷÷÷÷. Et- s o U te O G O e e O : tr - I ' J S T O g e I E E O or I is E E ET o o ① ± . ÷EE or or o J - E e E E s
  • 9. is 3 Er er . . I 8 Ess E -3 o o - G o OS u c S s d d - E E OE s w t o E o o J E = O E E G - r d Id O is 8 e E t E ° i n u EE s E - E - E O E s - d s gu - so § n E d s f O s . - c O G e u f O d O O - 88 U I 0 or 6 by us is • u w S S s s o - W s e j d u E U o g O s a ou ou N & si u O or 883 A 88 it's too > u s I E ch t g - o o O s 't ⑤ I it Cs cs T g J as a Notre u E O o - v . I d g. E n S E KE u e I o O → of E s o s 8 s I I 8 of to E Z t - - n y U ou u @ at so . - In O Es O s 6 w o @ Ed O t T i N T S e e E O ng O w . - u & . o - E U - o E on s a • R g S w O - z - O or O U N b d s D J O s cry to Nos y U G - u - - t O u z > so I ' E O og u O si w Dw or • ← we t w O F acs n - o o 8 3 u f o S - T I Z o 's - E u 06 I d.E es ° 6 § 03 O T w E s if÷÷÷÷f÷÷÷i÷÷÷:ff÷÷:÷f÷÷i÷÷÷ G O z S T s 89 f- od ' I O U G o T u Es E o 8- or # O E D s @ u w u o J as 6 ' I t O L O t u ' Es D O 6 g y U d 62 g- U w u o g N s or y es or u O > u d U s - - d E nd o b Eire E # E E = T I n - ooo EEE - o ' E O o - I • as u a 1- o 's x G , + as I o ' 88 go.IT IT IT IT - w s q El if ¥4 . - e d J o I 0 8 u s u g Ev or on us or
  • 10. Reasonstouses.cl#BJ 1) You need to ensure ACID compliance : ACID compliance Reduces anomalies Protects integrity of the database . for many E - commerce & financial app " → ACID compliant DB is the first choice . 2) Your data is structured & unchanging . If your business is not experiencing rapid growth or sudden changes → No requirements of more servers → data is consistent then there's no reason to use system design to support variety of data & high traffic .
  • 11. Reasonstouse.NO#IB When all other components of system are fast → querying & searching for data bottleneck . NoSQL prevent data from being bottleneck . Big data large success for NoSQL. 1) To store large volumes of data C little Ino structure) No limit on type of data. Document DB Stores all data in one place ( No need of type of data) 2) Using cloud & storage to the fullest . Excellent cost saving solution . ( Easy spread of data across multiple servers to scale up ) OR commodity hlw on site ( affordable , smaller ) No headache or additional Stw & NoSQL DBS like Cassander designed to scale across multiple data centers out of the box. 3) Useful for rapid 1 agile development. If you're making quick iterations on schema SQL will slow you down .
  • 12. Achieved by CAP-heore€ tenyC All nodes see same data updating several nodes . at same time) before allowing g) reads E " " " " e. %%%, Not g pareieiontdera£ Availability [cassandra . Couch DB] Hr Hr Every request gets System continues to work response ( success ( failure) despite message loss ( partial Achieved by replicating Failure . data across different servers ( can sustain any amount of network failure without - resulting in failure of entire Data is sufficiently replicated network ) across combination of nodes / networks to keep the system up . .in:6#eo:::i:::::ia::i:n:::ans'
  • 13. We cannot build a datastore which is : D continually available 2) sequentially consistent 3) partition failure tolerant . Because , To be consistent all nodes should see the same set of updates in the same order But if network suffers partition, update in one partition might not make it to other partitions ↳ client reads data from out-of-date partition After having read from up-to-date partition . Solutions stop serving requests from out - of - date partition. ↳ service is no longer 100% available .
  • 14. Redundancy&ReplicationJ Duplication of critical data & services ↳ increasing reliability of system . For critical services & data ensure that multiple copies 1 versions are running simultaneously on different servers 1 databases . Secure against single node failures . Provides backups if needed in crisis . Deter Dow Primary server secondary server & Data 2 - - a Ig Replication 2 Active data Mirrored data # Service Redundancy : shared - nothing architecture. Every node independent. No central service managing state . ] More resilient ← New servers ← Helps in [ to failures addition without scalability Nosing special conditions
  • 15. Caching Load balancing Scales horizontally caching : Locality of reference principle I Used in almost every layer of computing . I Application Server cache : Placing a cache directly on a request layer node. ↳ Local storage of response Requests in€.÷. miss response data # Caches on one Request layer made T catedy ✓ Memory (very fast) Node 's local disk ( faster than going to network storage) # # Bottleneck : If LB distributes requests randomly ↳ same request different nodes our:c!m£ More d by D Global caches 2) Distributed caches
  • 16. Distributed Gimme - Divided using consistent hashing function ③ ⑤gµeautts.-€# # # Easy to increase cache space by adding more hordes ## Disadvantage : Resolving a missing node staring multiple copies of I can be handled by data on different hordes likes making it more complicated . # # Even if node disappears request can pull data from Origin.
  • 17. flabellate # Single cache space for all the hacks . ↳ Adding a cache source I file store ( faster than original store ) # Difficult to manage if no af clients I request increases - effective if Y fixed dataset that needs to be cached 2) special Hlw fast Ho . # Forms of global cache : GEE { %fbtae.mn?hdEtahhYm Database database contains hot data at Global cache %gz { Database App " logic understands the eviction strategy that spoils better than cache .
  • 18. CDN : content Distribution network 4- Cache store for sites that saves large amount of static media . if not available Request CDN - Baek- End if a - server available( L local ( static Media) storage Lf the site isn't large enough to have its own CDN 4µA transition Some static media using separate subdomain - ( static . yaursuuice.com using lightweight Nginse serves ↳ entrance DNS from your sauce to a CDN later
  • 19. Cache Invalidation # Cached data needs to be coherent with the database Lf data in DB modified invalidate the cached data . # 3 schemes : = I Write - through cache : cache Data is written same time in Data DB hath cache & DB . + Complete data consistency C cache = DB) + Fault tolerance in case of failure Club data loss ) - high latency in writes 2 write operations 2) Write around cache Cache Data DB + No cache flooding foe writes - read request for newly written data miss higher latency d
  • 20. B) Write back cache : cache DB Data after some £ interval as under client some sepciifird conditions data is written to DB from cache + law latency & high throughput bae write - intensive app " - Data loss TT ( only one copy in caches # Cache Eviction Policies D FIFO 2) LIFO Ae FILO 3) LRV 4) MRU 5) LF U Random Replacement
  • 21. S harding 11 Data Partitioning # Data Partitioning : splitting up DD I table across multiple machines manageability . performance , availability & LB ** After a certain scale paint , it is cheaper and more feasible to scale horizontally by adding money instead of vertical scaling by adding beefiness # Methods of Partitioning : 1) Horizontal Partitioning : Different rows into diff. tables Range based shading e. g . staring locations by zip different Table 1 : Lips with L 100000 ranges in Table 2 : Lips with 7 Loo ooo different tables and so on ** come if the value of the range not chosen carefully leads to unbalanced servers e. g . Table I can have more data than table 2 .
  • 22. Vertical Partitioning # Feature wise distribution of data ↳ in different servers . e. g. Instagram - DB sauce 1 : user info DB sauce 2 : followers DB server 3 : photos * A straightforward to implement * A lane impact on app . - - if app → additional growth need to partition feature specific DB across various sources ( e - g. it would not be possible for a single sewer to handle all metadata queries for Lo billion photos by 140 mill. users Directory based partitioning A loosely coupled approach to work around issues mentioned in above two partitioning . ** Create lookup service current partitioning scheme & abstracts it away from the DB access code. Mapping l tuple key → D8 sauce) Easy to add DD towers or change partitioning scheme .
  • 23. Partitioning Criteria D key or Hash based partitioning : Kay atte- af Hash function → Partition the data number # Effectively fines the total number of sauces 1partitions So if we add new source I partition T o change in hash function downtime because of d redistribution ↳ Solution : consistent Hashing 2) List Partitioning : Each partition is assigned a list of values . Nuo → hookup record for → stare the record key ( partition based on the key)
  • 24. 3) Round Robin Partitioning : uniform data distribution with ' n . partitions the ' is tuple is assigned to partition (i mad n) 4 Composite Partitioning : combination af above partitioning schemes flashing t List consistent Hashing Hr Hash reduces the key space to a size that can be listed . # Common Problems of Shouting : Iharded DB : Entree constraints on the diff . operations Hr operations - across multiple tables or multiple rains in the same table 7 no longer running in single severe.
  • 25. " Jains A Denoumalizatiom : Jains on tables on single sauce straightforward. * not feasible to perform joins on shrouded tables ↳ Less efficient C data needs to be compiled from multiple servers) # Workaround Denarmalip the DB so that the queries that previously read. jains can be performed from a single table . ( coins Perils of denavmalizatiom ↳ data inconsistency 2) Referential integrity : Foreign keys om shrouded D8 ↳ difficult * Mast of the RDBMS does not support foreign keys on stranded DB . # If app " - demands referential integrity om shrouded DB ↳ enforce it in app " code C SOL jobs to clean up dangling references)
  • 26. 3) Rebalancing : Reasons to change sharking scheme : a) horn - uniform distribution C data wise ) b) non - uniform laced balancing C request wise) Workaround: Y add new DB 2) rebalance ↳ change in partitioning scheme ↳ data monument { ↳ downtime We can use directory - based partitioning ↳ highly complex ↳ single paint of failure ( lookup service 1 table)
  • 27. Indexes Well known because - of databases . Improves speed of retrieval - Increased storage auerhead - Shauna writes ↳ write the data ↳ Update the index Can be created using one or more columns * Rapid random lookups & efficient access of ordered records . # Data structure column → Painter to whale raw → Create different views of the - same data . ↳ very good for filtering / sorting of large data sets . ↳ no need to create additional copies. # Used foe datasets ( TB in size) & small payload ( KB) I spied over several physical devices → We need some way to find the correct physical location i. e. Indexes
  • 28. useful under high load situations Peonies - if we have limitcdcaehing ↳ batches several requests into one client Backend > > MY > sauce > Peony client > Source ^ a filters requests - log requests transform - add / remain headers encryption / decryption frequently compression used resources request co - ordination ( request traffic optimization T we can also use ← Collapse same data access spatial locality request into one. ↳ collapsing requests collapsed forwarding for data that is spatially Ipsf minimize reads from - origin .
  • 29. Queues Effectively manages requests in large - scale distributed system → In small systems → writes are fast . → In complex systems → high incoming load 4- individual writes take more time * To achieve high performance & availability ↳ system needs to be asynchronous ↳ Queue # Synchronous behaviour → degrades performance d ye Load balancing difficult for fair & balanced distribution server C , T, - T2 r C, -13 T , gag { , T2 ↳ Queue Cu
  • 30. # Queues : asynchronous communication protocol ↳ client sends task ↳ gets ACK from queue lecccipt) I serves as reference for the results in future [ client continues its work . # Limit on the sispafeeguest & number of requests in queue # Queue : Provides fault - tolerance [ ↳ protection from service outage / failure highly robust [ ↳ retry failed service request Enforces Quality of Service guarantee L Does NOT expose clients to outages) # Queues : distributed communication ↳ Open source implementations ↳ Rabbitma , Zoeoma , Active MQ , BeanstalkD .
  • 31. Consistent Hashing # Distributed Hash Table index = hash - function C key) # Suppose we're designing distributed caching system with n cache servers ↳ hash . function ( key % n ) Drawbacks : 1) NOT horizontally scalable ↳ addition of new server results in ↳ need to change all existing mapping. ( downtime of system) 2) NOT load balanced l because - af non - uniform distribution of data ) 1- Some caches : hat & saturated Other caches : idk & empty How to tackle about problems ? Consistent flashing
  • 32. What is consistent Hashing ? → Very useful strategy for distributed caching & outs . → minimizes reorganization in scaling up / dawn. → only kin keys needs to be remapped. k total number of keys n number of servers How it works ? Typical hash function suppose outputs in [ 0 . 2567 In consistent hashing , imagine all of these integers are placed on a ring . 255 0 254 , • • of 253 , • 2 ± " & we have 3 servers : A , B & C .
  • 33. 1) Given a list of servers , hash them to integers in the range. 255 0 C A B 2) Map key to a serum : a) Hash it to single integer b) Mane CLK wise until you find Laura c) map key to that server . 255 0 hL key - 1) of A " h ( key - 2) ' B
  • 34. Adding a new server ' . will result in morning the - key -2 ' to 'D 255 0 hL key - 1) C A " a D h ( key - 2) ' B Removing server IA ' , will result in morning the - key-1 ' to II 255 O h ( key - 1) ① a D h ( key - 2) ' B •
  • 35. Consider real world scenario data → randomly distributed ↳ unbalanced cactus . How to handle this issue ? Virtual Replicas Instead - of mapping each node to a single paint we map it to multiply paints . ↳ ( more number of replicas ↳ more equal distribution ↳ good load balancing) 255 0 D C • h ( key - 1) A a- C B B C AD D h [ Key - 2) ^ A B
  • 36. Long - Palling vs tikbsoekrts us Serves - Sent Events ' ↳ Client - Senior Communication Protocols # HTTP Protocol : request NJ prepare Remorse client > Serum < . Response # AJAX Patting : Clients repeatedly palls servers for data similar to HTTP protocol ↳ requests sent to screen at regular intervals (0.5sec ) Drawbacks : Client keeps asking the source now data ↳ Lot of uspomsgy.au ' empty ' ↳ HTTP Overhead. . Request . ) 4 Response Request. > Client a response sooner . Request > < Response
  • 37. # HTTP Long Patting : • Hanging GET ' Sauce does NOT send empty response . Pushes response to clients only when new data is available ☐ Client makes HTTP Request 4- waits for the response . 2) Server delays response until update is available or until time-out occurs. 3) When update → server sends full response. 4) Client sends now long- poll request a) immediately after receiving response d) after a pause to allow acceptable latency period 5) Each request has timeout. Client needs to reconnect periodically due to timeouts LP Request . 7 < My full Response LP Request. > Client < . full Response Showa LP Request > a full Response
  • 38. Wet Sockets → duplex communication channel over single TCP connection . → Provides ' persistent communication ' ( client a serum can send data at anytime] → bidirectional communication in always open channel . pep socket Handshake Request Handshake Success Response < - : is i. Client < ' . Source . Communication } channel → Lower overheads → Real time data transfer
  • 39. Server - sent Events ( SSE) client establishes persistent & long- term connection with sauna server uses this connection to send data to client * * If client wants to send data to server ↳ Requires another technology / protocol . data request using regular HTTP v ' ing = i. Client < . Source • Always - open unidirectional < communication < . channel - responses whenever now data available → best when we need real - time data from soever to client OR server is generating data in a loop & will be sending multiple events to the client .