Apache ZooKeeper
MAHADEV KONAR
What is ZooKeeper?
A highly available, scalable, distributed
coordination kernel
Use Cases
» Leader Election
» Group Membership
» Work Queues
» Event Notifications/workflow management
» Configuration Management
» Cluster Management
» Sharding
What is ZooKeeper again?
File api without partial reads/writes
No renames
Ordered updates and strong persistence
guarantees
Conditional updates (version)
Watches for data changes
Ephemeral znodes
Generated file names
Data Model
/
Hierarchal namespace
apps
Each znode has data and app1
children servers
data is read and written
regionserver
in its entirety
master
locks
read-1
users
ZooKeeper API
String create(path, data, acl, flags)
void delete(path, expectedVersion)
Stat setData(path, data, expectedVersion)
(data, Stat) getData(path, watch)
Stat exists(path, watch)
String[] getChildren(path, watch)
ZooKeeper Service
ZooKeeper Service
Leader
Server Server Server Server Server
Client Client Client Client Client Client Client
All servers store a copy of the data (in memory)
A leader is elected at startup
Followers service clients, all updates go through leader
Update responses are sent when a majority of servers have persisted the change
ZooKeeper and HBase
Master Failover
Region Servers and Master discovery via ZooKeeper
HBase clients connect to ZooKeeper to find configuration data
Region Servers and Master failure detecti0n
Hbase and ZooKeeper as of now!
shutdown
• Master
/ • If more than one master, they fight
root-region-server • Root Region Server
• This znode holds the location of the server
hosting the root of all tables in hbase
rs • rs
• A directory in which there is a znode per
master Hbase region server
• Region Servers register themselves with
ZooKeeper when they come online
• On Region Server failure (detected via ephemeral
znodes and notification via ZooKeeper), the master
splits the edits out per region
Common Problems/Error Cases
Garbage Collection at the Region Servers
Causes zookeeper clients to stall
Session expiry
Low throughput and connection loss
Mostly due to under provisioned ZooKeeper instances
Disk and Memory usage
Bad Usage example:
NameNode, RegionServer, JobTracker, ZooKeeper running on
the same node
Release 3.3.0, whats in for Hbase?
Allow configuration of session timeout min/max
bounds
HBase needs large session timeouts
Improved logging information to detect issues
Improved debugging tools
Improved documentation
Improved performance and robustness
Queue implementation available
Upcoming 3.4 release
No Connectionloss
Use Netty - allow encryption
Testing
Mockito
More of backwards compatibility testing
More ZooKeeper in Hbase?
Table Schema and state in ZooKeeper
read only, online
Region Server state transitions via ZooKeeper
Store region assignment in ZooKeeper for each
Region Server
http://wiki.apache.org/hadoop/ZooKeeper/
HBaseUseCases
Questions?