0% found this document useful (0 votes)
85 views59 pages

Zookeeper

Uploaded by

Manisha Yuvaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views59 pages

Zookeeper

Uploaded by

Manisha Yuvaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Big Data

Technologies

Sathyavathi.S
Department of Information Technology
Overview of this Lecture
•Zoo keeper
ZooKeeper

A highly-available service for coordinating


processes of distributed applications.

• Developed at Yahoo! Research

• Started as sub-project of Hadoop, now a top-level


Apache project

• Development is driven by application needs

http://zookeeper.apache.
SQL vs NoSQL

Agenda Introduction of MongoDB

MongoDB Features

Replication/ High Availability

Sharding/ Scaling
SQL vs NoSQL
 NoSQL (often interpreted as Not only SQL) database

 It provides a mechanism for storage and retrieval of


data that is modeled in means other than the tabular
relations used in relational databases.

SQL NoSQL
Relational Database Management System (RDBMS) Non-relational or distributed database system.

These databases have fixed or static or predefined schema They have dynamic schema

These databases are best suited for complex queries These databases are not so good for complex queries

Vertically Scalable Horizontally scalable

Follows ACID property Follows BASE property


SQL vs NoSQL
NoSQL Types

Graph database

Document-oriented

Column family
What is MongoDB?
 MongoDB is an open source, document-oriented database designed with
both scalability and developer agility in mind.
 Instead of storing your data in tables and rows as you would with a
relational database, in MongoDB you store JSON-like documents with
dynamic schemas(schema-free, schema less).
{
"_id" : ObjectId("5114e0bd42…"),
“FirstName" : "John",
“LastName" : "Doe",
“Age" : 39,
“Interests" : [ "Reading", "Mountain Biking ]
“Favorites": {
"color": "Blue",
"sport": "Soccer“
}
}
MongoDB is Easy to Use
Scheme Free RDBMS vs MongoDB
MongoDB does not need any pre-defined data schema
Every document could have different data!
RDBMS MongoDB
{name: “will”, {name: “jeff”, {name: “brendan”, Database Database
eyes: “blue”, eyes: “blue”, boss: “will”} Table Collection
birthplace: “NY”, loc: [40.7, 73.4],
aliases: [“bill”, “ben”],
loc: [32.7, 63.4],
boss: “ben”} Row Document (JSON, BSON)
boss: ”ben”}
{name: “matt”, Column Field
weight:60, Index Index
height: 72,
{name: “ben”, Join Embedded Document
loc: [44.6, 71.3]}
age:25} Partition Shard
Features Of MongoDB
• Document-Oriented storege
• Full Index Support
• Replication & High Availability
• Auto-Sharding
• Aggregation
• MongoDB Atlas
• Various APIs
• JavaScript, Python, Ruby, Perl, Java, Java, Scala, C#, C++,
Haskell, Erlang
• Community
Replication
• Replication provides redundancy and increases data availability.
• With multiple copies of data on different database servers,
replication provides a level of fault tolerance against the loss of a
single database server.

Copy of database Copy of database


Replication
Sharding
• Sharding is a method for
distributing data across multiple
machines.
• MongoDB uses sharding to support
deployments with very large data
sets and high throughput
operations.
Sharding Architecture

• Shard is a Mongo instance


to handle a subset of
original data.
• Mongos is a query router to
shards.
• Config Server is a Mongo
instance which stores
metadata information and
configuration details of
cluster.
Sharding/Replication

• Replication Split data sets


across multiple data nodes
for high availability.
• Sharding scale up/down
horizontally when it is
required for high throughput
ZooKeeper in the
Hadoop ecosystem

Pig Hive Sqoop


(Data (SQL (Data Transfer)
Flow) )
(Coordinati
ZooKeep

(Serializati
MapReduce (Job

Avro
Scheduling/Execution)
on)

on)
er

HBase (Column DB)

HDFS
Coordination

19

Proper
coordination is
not easy.
Fallacies of distributed
computing
• The network is reliable

• There is no latency

• The topology does not change

• The network is homogeneous

• The bandwidth is infinite

• …
Motivation
• In the past: a single program running on a single
computer with a single CPU

• Today: applications consist of independent programs


running on a changing set of computers

• Difficulty: coordination of those independent programs

• Developers have to deal with coordination logic and


application logic at the same time

ZooKeeper: designed to relieve developers from


writing coordination
9 logic code.
Lets think ….
Question: how do you elect
the leader?

application logic

A program that
crawls the
Web

one machine (the leader)


should coordinate the effort
coordination logic
a cluster with a
few hundred machines
Question: how do you lock
a service? •a
p
p
l
i
c
a
t
i
o
a cluster with a n
coordination logic
few hundred machines
one database 12 l
Question: how can the
configuration
be distributed?

configuration file
application logic

A program that
crawls the
Web
Every worker should
start with the same
configuration
a cluster with a coordination logic
few hundred machines
13
Solution approaches
• Be specific: develop a particular service for each
coordination task
• Locking service

• Leader
• election etc.
Be general: provide an API to make many services
• possible
ZooKeeper
API that enables
application specific primitives are
The developers
Rest to
implement their own implemented on the
primitives easily server side
15
How can a distributed
system look like?

MASTER

Slave Slave Slave Slave

+ simple
- coordination performed by the master
- single point of failure
- scalability
How can a distributed
system look like?

+ not a single point of failure anymore


- scalability is still an issue
How can a distributed
system look like?

+ scalability
What makes distributed
system coordination difficult?

Partial failures make application writing difficult

message

nothing comes back network failure

Sender does not know:


• whether the message was received
• whether the receiver’s process died before/after
processing the message
19
Typical coordination
problems in distributed
systems
• Static configuration: a list of operational parameters for the
system processes

• Dynamic configuration: parameter changes on the fly

• Group membership: who is alive?

• Leader election: who is in charge who is a backup?

• Mutually exclusive access to critical resources (locks)

• Barriers (supersteps in Giraph for instance)

The ZooKeeper API allows us to implement all these coordination


tasks20 easily.
ZooKeeper principles
ZooKeeper’s design
principles
• API is wait-free Remember the
dining philosophers,
• No blocking primitives in ZooKeeper forks & deadlocks.
• Blocking can be implemented by a
• client No deadlocks

• Guarantees
• Client requests are processed in FIFO order
• Writes to ZooKeeper are linearisable

• Clients receive notifications of changes before the


changed data becomes visible
ZooKeeper’s
strategy to be fast
and reliable
• ZooKeeper service is an ensemble of servers that
use replication (high availability)

• Data is cached on the client side:


Example: a client caches the ID of the current leader
instead of probing ZooKeeper every time.

• What if a new leader is elected?


• Potential solution: polling (not optimal)
• Watch mechanism: clients can watch for an
update of a given data object ZooKeeper is optimised for
read-dominant
operations!
ZooKeeper
terminology
• Client: user of the ZooKeeper service

• Server: process providing the ZooKeeper service

• znode: in-memory data node in ZooKeeper,


organised in a hierarchical namespace (the data tree)

• Update/write: any operation which modifies the state


of the data tree

• Clients establish a session when connecting to


ZooKeeper
ZooKeeper’s data
model: filesystem
• znodes are organised in a hierarchical namespace

• znodes can be manipulated by clients through


the ZooKeeper API

• znodes are referred to by UNIX style file system


paths /

/app1 /app2

All znodes store data (file like) & can have


/app1/p_1 /app1/p_2 /app1/p_3
children (directory like).
znodes
• znodes are not designed for general data storage
(usually require storage in the order of kilobytes)

• znodes map to abstractions of the client


application

Group membership protocol:


Client process pi creates
znode p_i under /app1. /
/app1 persists as long as the
/app1 /app2
process is running.

/app1/p_1 /app1/p_2 /app1/p_3


znode flags
• Clients manipulate znodes by creating and
deleting them
ephemeral (Greek): passing, short-lived

• EPHEMERAL flag: clients create znodes which


are deleted at the end of the client’s session

• SEQUENTIAL flag: monotonically increasing


counter appended to a znode’s path;
counter value of a new znode under a parent is
always larger than value of existing children
/app1_5
create(/app1_5/p_, data, SEQUENTIAL)

/app1_5/p_1 /app1_5/p_2 /app1_5/p_3


znodes & watch flag
• Clients can issue read operations on znodes with a
watch flag

• Server notifies the client when the information on the


znode has changed

• Watches are one-time triggers associated with a


session (unregistered once triggered or session closes)

• Watch notifications indicate the change, not the new


data
Sessions

• A client connects to ZooKeeper and initiates a


session

• Sessions have an associated timeout

• ZooKeeper considers a client faulty if it does not


receive anything from its session for more than that
timeout

• Session ends: faulty client or explicitly ended


by client
A few
implementation
details
ZooKeeper data is replicated on each server that
composes the service
replicated
across all
servers
(in-memory)

updates first
logged to
disk; write-
write request requires ahead log
coordination between and snapshot
servers for recovery

Source: http://bit.ly/13VFohW 30
A few
implementation
details
• ZooKeeper server services clients

• Clients connect to exactly one server to submit


requests
• read requests served from the local replica
• write requests are processed by an agreement
protocol (an elected server leader initiates
processing of the write request)

31
Lets work through
some examples
No partial read/writes
(no open, seek or

ZooKeeper API close methods).

• String create(path, data, flags)


• creates a znode with path name path, stores data in it and sets flags
(ephemeral, sequential)

• void delete(path, version)


• deletes the anode if it is at the expected version

• Stat exists(path, watch)


• watch flag enables the client to set a watch on the znode

• (data, Stat) getData(path, watch)


• returns the data and meta-data of the znode

• Stat setData(path, data, version)


• writes data if the version number is the current version of the znode

• String[] getChildren(path, watch)

or similar methods.
Example: configuration
• String create(path, data, flags)
• void delete(path, version)
• watch)
Questions: • Stat exists(path,
(data, Stat) getData(path, watch)
1.How does a new worker query ZK • Stat setData(path, data, version)
for a configuration? • String[] getChildren(path, watch)
2. How does an administrator
change
the configuration on the fly?
3. How do the workers read the new /
45
configuration?
[configuration stored in /app1/config] / /
1.getData(/app1/config,true) app configuration
app1 app2
2.setData(/app1/config/config_data,-1)
[notify watching clients]
3. getData(/app1/config,true)

/app1/ /app1/
config progress
Example: group • String create(path, data, flags)

membership •

void delete(path, version)
watch)
• Stat exists(path,
(data, Stat) getData(path, watch)
• Stat setData(path, data, version)
Questions: • String[] getChildren(path, watch)
1.How can all workers (slaves) of an
application register themselves on ZK? /
2. How can a process find out about all
active workers of an application?
/app1
[a znode is designated to store workers]
3.create(/app1/workers/ 46
worker,data,EPHEMERAL) /app1/workers
4. getChildren(/app1/
workers,true)

/app1/workers/worker1 /app1/workers/worker2
Example: • String create(path, data, flags)

simple
• voi delete(path, version)
d

• (data, Stat) getData(path,
Sta exists(path, watch) watch)

locks •

t
Stat setData(path, data, version)
String[] getChildren(path, watch)

Question:
1. How can all workers of an application use a single resource through
a lock?

create(/app1/lock1,…,EPHE.) /
/app1

yes /app1/workers
ok? use locked resource
/app1/lock1

/app1/workers/worker1 /app1/workers/worker2
getData(/app1/lock1,true)
all processes compete at all times for the lock
36
Example:
locking without
herd effect
id=create(/app1/locks/lock_,SEQ.|EPHE.)

ids = getChildren(/app1/locks/,false)

/
ye
id=min(ids s exit (use lock)
)?
/app1
no
/app1/locks
exists(max_id<id,true)

/app1/locks/lock_1 /app1/locks/lock_2
wait for notification

Question:
1. How can all workers of an application use a single resource through
a lock? 37
• String create(path, data, flags)

Example: •

void delete(path, version)
watch)
Stat exists(path,
(data, Stat) getData(path, watch)
leader election

• Stat setData(path, data, version)
• String[] getChildren(path, watch)

Question:
1. How can all workers of an application elect a leader among
themselves?

getData(/app1/workers/leader,true)
/

ok follow
ye /app1
?
s

create(/app1/workers/leader,IP,EPHE.)
/app1/workers

n
o ok lead /app1/workers/leader /app1/workers/worker1
? ye
s

if the leader dies, elect again (“herd effect”)


38
Zookeeper Video
• https://youtu.be/AS5a91DOmks
ZooKeeper
applications
The Yahoo! fetching
service
• Fetching Service is part of Yahoo!’s crawler infrastructure

• Setup: master commands page-fetching processes


• Master provides the fetchers with configuration
• Fetchers write back information of their status and health

• Main advantage of ZooKeeper:


• Recovery from master failures
• Guaranteed availability despite failures

• Used primitives of ZK: configuration metadata,


leader election
Yahoo! message
broker
• A distributed publish-subscribe system

• The system manages thousands of topics that clients


can publish messages to and receive messages
from
• The topics are distributed among a set of servers to
provide scalability

• Used primitives of ZK: configuration metadata (to


distribute topics), failure detection and group
membership
Yahoo! message broker
primary and
backup server
per topic; topic
subscribers
monitored
by all
servers

54

ephemeral
nodes
Source: http://bit.ly/13VFohW
Throughput
Setup: 250 clients, each client has at least 100
outstanding requests (read/write of 1K data)

crossing
eventually
always happens

only only
write request
read
requests s

Source: http://bit.ly/13VFohW
Recovery from failure
Setup: 250 clients, each client has at least 100
outstanding requests (read/write of 1K data);
5 ZK machines (1 leader, 4 followers), 30%
writes
(1)failure & recovery
of a follower
(2)failure & recovery
of a different
follower
(3) failure of the
leader
(4)failure of
followers (a,b),
recovery at (c)
(5) failure of the
leader
(6)recovery of
the leader
Source: http://bit.ly/13VFohW
44
References
• [book] ZooKeeper by Junqueira & Reed, 2013
(available on the TUD campus network)

• [paper] ZooKeeper: Wait-free coordination for Internet-


scale systems by Hunt et al., 2010; http://bit.ly/
13VFohW
Summary
• Whirlwind tour through ZooKeeper

• Why do we need it?

• Data model of ZooKeeper: znodes

• Example implementations of different coordination


tasks

You might also like