0% found this document useful (0 votes)
51 views14 pages

ZooKeeper for Distributed Systems

ZooKeeper is a highly available, scalable, distributed coordination service that provides synchronization and configuration for distributed systems. It allows for leader election, group membership, work queues, event notifications, and more. It provides a hierarchical namespace and data model with watches, conditional updates, and strong consistency guarantees. ZooKeeper is used by HBase for master failover, region server and master discovery, and metadata storage. Common issues involve garbage collection, low throughput, and improper configuration like running ZooKeeper and HBase on the same nodes. Future releases aim to improve performance, robustness, and reduce connection losses.

Uploaded by

satmania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views14 pages

ZooKeeper for Distributed Systems

ZooKeeper is a highly available, scalable, distributed coordination service that provides synchronization and configuration for distributed systems. It allows for leader election, group membership, work queues, event notifications, and more. It provides a hierarchical namespace and data model with watches, conditional updates, and strong consistency guarantees. ZooKeeper is used by HBase for master failover, region server and master discovery, and metadata storage. Common issues involve garbage collection, low throughput, and improper configuration like running ZooKeeper and HBase on the same nodes. Future releases aim to improve performance, robustness, and reduce connection losses.

Uploaded by

satmania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Apache ZooKeeper

MAHADEV KONAR
What is ZooKeeper?

A highly available, scalable, distributed


coordination kernel
Use Cases

» Leader Election
» Group Membership
» Work Queues
» Event Notifications/workflow management
» Configuration Management
» Cluster Management
» Sharding
What is ZooKeeper again?

 File api without partial reads/writes


 No renames
 Ordered updates and strong persistence
guarantees
 Conditional updates (version)
 Watches for data changes
 Ephemeral znodes
 Generated file names
Data Model
/
 Hierarchal namespace
apps
 Each znode has data and app1
children servers
 data is read and written
regionserver
in its entirety
master

locks
read-1

users
ZooKeeper API

String create(path, data, acl, flags)

void delete(path, expectedVersion)

Stat setData(path, data, expectedVersion)

(data, Stat) getData(path, watch)

Stat exists(path, watch)

String[] getChildren(path, watch)


ZooKeeper Service

ZooKeeper Service
Leader

Server Server Server Server Server

Client Client Client Client Client Client Client

 All servers store a copy of the data (in memory)


 A leader is elected at startup
 Followers service clients, all updates go through leader
 Update responses are sent when a majority of servers have persisted the change
ZooKeeper and HBase

Master Failover

Region Servers and Master discovery via ZooKeeper


 HBase clients connect to ZooKeeper to find configuration data
 Region Servers and Master failure detecti0n
Hbase and ZooKeeper as of now!

shutdown
• Master
/ • If more than one master, they fight
root-region-server • Root Region Server
• This znode holds the location of the server
hosting the root of all tables in hbase
rs • rs
• A directory in which there is a znode per
master Hbase region server
• Region Servers register themselves with
ZooKeeper when they come online
• On Region Server failure (detected via ephemeral
znodes and notification via ZooKeeper), the master
splits the edits out per region
Common Problems/Error Cases

Garbage Collection at the Region Servers


 Causes zookeeper clients to stall
 Session expiry

Low throughput and connection loss


 Mostly due to under provisioned ZooKeeper instances
 Disk and Memory usage

Bad Usage example:


 NameNode, RegionServer, JobTracker, ZooKeeper running on
the same node
Release 3.3.0, whats in for Hbase?

Allow configuration of session timeout min/max


bounds
 HBase needs large session timeouts
Improved logging information to detect issues
Improved debugging tools
Improved documentation
Improved performance and robustness
Queue implementation available
Upcoming 3.4 release

No Connectionloss
Use Netty - allow encryption
Testing
 Mockito
More of backwards compatibility testing
More ZooKeeper in Hbase?

Table Schema and state in ZooKeeper


 read only, online
Region Server state transitions via ZooKeeper
Store region assignment in ZooKeeper for each
Region Server
http://wiki.apache.org/hadoop/ZooKeeper/
HBaseUseCases
Questions?

You might also like