原文链接:zookeeper单机客户端源码分析
工作中有大量使用zookeeper实现分布式锁,所以打算研究一下zk源码,对今后zk锁的问题排查和性能评估打下基础;
1.ZK基本概念
1.1 请求类型
- 事务请求:更新、新增、删除,因为这些操作是会影响数据的,所以要保证这些操作在整个集群内的事务性,所以这些操作就是事务性请求;
对于事务请求,只能由Leader进行处理,如果Follower接收到事务请求会转发给Leader处理,由Leader做消息广播;(保证ZK数据的一致性) - 非事务请求:不影响数据内容的操作,比如:查询;
客户端会轮训方式选择节点发起请求;
1.2 ZK集群角色
- Leader:事务请求的唯一调度和处理者,保证集群事务处理的顺序性; 集群内部服务器的调度者(管理follower,数据同步)
- Follower:处理非事务请求,转发事务请求给Leader,参与事务请求proposal投票,参与leader选举投票;
- Observer:处理非事务请求,转发事务请求给Leader,不参与投票,一般用作提高集群非事务处理能力的场景;
2. 客户端源码
zookeeper源码选择的是:3.5.5版本;客户端源码从ZookeeperMain开始;
public static void main(String args[]) throws CliException, IOException, InterruptedException
{
// zk客户端初始化都包含在ZookeeperMain构造函数中;
ZooKeeperMain main = new ZooKeeperMain(args);
main.run();
}
public ZooKeeperMain(String args[]) throws IOException, InterruptedException {
// 解析客户端传参:–server 127.0.0.1:2 -timeout 30000
cl.parseOptions(args);
System.out.println("Connecting to " + cl.getOption("server"));
connectToZK(cl.getOption("server"));
}
protected void connectToZK(String newHost) throws InterruptedException, IOException {
// 如果初始化之前,zk状态为活跃,则先关闭
if (zk != null && zk.getState().isAlive()) {
zk.close();
}
host = newHost;
boolean readOnly = cl.getOption("readonly") != null;
if (cl.getOption("secure") != null) {
// 如果配置的有权限,则向pro中标明;
System.setProperty(ZKClientConfig.SECURE_CLIENT, "true");
System.out.println("Secure connection is enabled");
}
zk = new ZooKeeperAdmin(host, Integer.parseInt(cl.getOption("timeout")), new MyWatcher(), readOnly);
}
此时,代码进入到ZookeeperAdmin,ZookeeperAdmin的类图关系:
createDefaultHostProvider:将传入的服务器地址和端口解析成InetSocketAddress,在ClientCnxn向服务器发起连接时,会从这里面轮训取出尝试连接;
在createDefaultHostProvider方法内,会shuffle(serverAddresses);
对指定的服务地址打散,主要目的是:让客户端启动时,连接服务器能随机选出一台,不然会对指定的第一台服务器造成太大的服务压力
public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher,
boolean canBeReadOnly) throws IOException {
this(connectString, sessionTimeout, watcher, canBeReadOnly,
createDefaultHostProvider(connectString));
}
public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher,
boolean canBeReadOnly, HostProvider aHostProvider,
ZKClientConfig clientConfig) throws IOException {
// sessionTimeout默认是30s;
LOG.info("Initiating client connection, connectString=" + connectString
+ " sessionTimeout=" + sessionTimeout + " watcher=" + watcher);
if (clientConfig == null) {
clientConfig = new ZKClientConfig();
}
this.clientConfig = clientConfig;
watchManager = defaultWatchManager();
watchManager.defaultWatcher = watcher;
// 从这里面主要拿到rootPath,配置连接的时候好像可以 -server 127.0.0.1:2181/root
// chroot是客户端命名空间,一但配置了这个参数,那么客户端的操作只能限制在这个目录下;
ConnectStringParser connectStringParser = new ConnectStringParser(
connectString);
hostProvider = aHostProvider;
cnxn = createConnection(connectStringParser.getChrootPath(),
hostProvider, sessionTimeout, this, watchManager,
getClientCnxnSocket(), canBeReadOnly);
cnxn.start();
}
上面在调createConnection方法时,会先调用getClientCnxnSocket方法,而这个方法主要是选择使用哪种通信框架和服务端通信;默认使用ClientCnxnSocketNIO
private ClientCnxnSocket getClientCnxnSocket() throws IOException {
String clientCnxnSocketName = getClientConfig().getProperty(
ZKClientConfig.ZOOKEEPER_CLIENT_CNXN_SOCKET);
if (clientCnxnSocketName == null) {
clientCnxnSocketName = ClientCnxnSocketNIO.class.getName();
}
try {
Constructor<?> clientCxnConstructor = Class.forName(clientCnxnSocketName).getDeclaredConstructor(ZKClientConfig.class);
ClientCnxnSocket clientCxnSocket = (ClientCnxnSocket) clientCxnConstructor.newInstance(getClientConfig());
return clientCxnSocket;
} catch (Exception e) {
IOException ioe = new IOException("Couldn't instantiate "
+ clientCnxnSocketName);
ioe.initCause(e);
throw ioe;
}
}
createConnection方法的源码,主要是创建ClientCnxn
public ClientCnxn(String chrootPath, HostProvider hostProvider, int sessionTimeout, ZooKeeper zooKeeper,
ClientWatchManager watcher, ClientCnxnSocket clientCnxnSocket,
long sessionId, byte[] sessionPasswd, boolean canBeReadOnly) {
this.zooKeeper = zooKeeper;
this.watcher = watcher;
// 默认传进来的sessionId=0,服务端判断sessionId=0会生成一个新的sessionId给客户端;
this.sessionId = sessionId;
this.sessionPasswd = sessionPasswd;
this.sessionTimeout = sessionTimeout;
this.hostProvider = hostProvider;
this.chrootPath = chrootPath;
connectTimeout = sessionTimeout / hostProvider.size();
readTimeout = sessionTimeout * 2 / 3;
readOnly = canBeReadOnly;
// 主要负责客户端与服务端的通信;
sendThread = new SendThread(clientCnxnSocket);
// 负责节点的监听通知;
eventThread = new EventThread();
this.clientConfig=zooKeeper.getClientConfig();
initRequestTimeout();
}
回到上面的cnxn.start();,主要启用线程:
public void start() {
sendThread.start();
eventThread.start();
}
2.1 客户端主要类说明
-
ClientCnxn:负责客户端与服务器之间的通信,心跳连接,主要包括两个主要成员SendThread和EventThread;
ClientCnxn内部维护两个重要的LinkedBlockingDeque,outgoingQueue存放的是需要发送到服务端的Packet;pendingQueue存放的是已经发送待服务器响应的Packet; -
Packet:Packet 这个是zk自己封装的数据结构 用于客户端和服务器之间发送请求;它有一个createBB方法是将Packet内容序列化发送到服务端;
-
SendThread:它的作用就是用来管理操作客户端和服务端的网络 I/O 等。在 ZooKeeper 服务的运行过程中,SendThread 类的作用除了上面提到的负责将客户端的请求发送给服务端外,另一个作用是发送客户端是否存活的心跳检查;
-
EventThread:EventThread线程不断的从waitingEvents这个队列中取出Object,识别出其具体类型Watcher或者AsyncCallback,并分别调用process和processResult接口方法来实现对事件的触发和回调;
-
ClientCnxnSocketNIO:它是具体客户端与服务通信的工具,负责管理Socket;
2.2 SendThread
sendThread主要分析一下它的run方法:
// 将sendThread、sessionId、outgoingQueue传到了clientCnxnSocketNio内部;
clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
在这个循环内,有很多对时间的计算:
- now :每次轮询select之前更新,或者发生错误是在catch段中更新为当前时间
- lastHeard:在读取了响应,包括上面提到的connect型请求和常规命令型请求的响应以及完成网络连接时更新为当前时间
- lastSend:每次发送完ping 命令和请求以及完成网络连接时更新为当前时间
2.2.1 客户端时间的配置
- sessionTimeout:zookeeper初始化时设置的
- readTimeout:sessionTimeout * 2 / 3
- connectTimeout:sessionTimeout / hostProvider.size(); //hostProvider.size()为zookeeper服务器个数 这个主要是连接失败之后,可以切换其它的服务节点进行尝试,并且可以尝试hostProvider.size()次;
- getIdleRecv():now - lastHeard
- getIdleSend():now - lastSend
SessionTimeout的计算:
- 如果没有完成连接 to=connectTimeout - getIdleRecv()
- 如果完成连接 to=readTimeout - getIdleRecv() 多久没有收到服务器的响应;
- 如果to<=0 就会抛出SessionTimeoutException### 2.4 xid顺序问题
发送PING计算:
计算timeToNextPing = readTimeout / 2-getIdleSend() 多久没有给服务器发送请求;
如果timeToNextPing <= 0,发送ping请求(只是将ping请求放入outgoingQueue,并不发生IO)
2.2.2 客户端状态转换
状态的定义是org.apache.zookeeper.ZooKeeper.States,这是一个枚举,其值有:CONNECTING, ASSOCIATING, CONNECTED, CONNECTEDREADONLY, CLOSED, AUTH_FAILED, NOT_CONNECTED。ZK的状态变量实际是由ClientCnxn持有,名为state,它是一个volatile变量,这说明它会被多个线程读取。状态的切换工作主要是由SendThread来做。状态的转移图如下:
- CONNECTING:正在连接,开始建立连接时置为该状态;
- ASSOCIATING:该状态暂时没用到;
- CONNECTED:已连接;
- CONNECTEDREADONLY:已建立只读连接;
- CLOSED:已关闭;
- AUTH_FAILED:无权限;
- NOT_CONNECTED:未连接,默认状态;
2.2.3 SendThread.run()
主要是连接服务器、处理读写监听事件、发送心跳、session超时等
// 将sendThread、sessionId、outgoingQueue传到了clientCnxnSocketNio内部;
clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
// 更新now=当前时间
clientCnxnSocket.updateNow();
// lastSend:
// lastHeard:
clientCnxnSocket.updateLastSendAndHeard();
int to;
long lastPingRwServer = Time.currentElapsedTime();
final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
InetSocketAddress serverAddress = null;
while (state.isAlive()) {
try {
if (!clientCnxnSocket.isConnected()) {
// don't re-establish connection if we are closing
if (closing) {
break;
}
if (rwServerAddress != null) {
serverAddress = rwServerAddress;
rwServerAddress = null;
} else {
// 每次重连服务端是,间隔1s重试;
serverAddress = hostProvider.next(1000);
}
startConnect(serverAddress);
clientCnxnSocket.updateLastSendAndHeard();
}
// 如果已经连接到zk服务器,则计算session是否超时;
if (state.isConnected()) {
// determine whether we need to send an AuthFailed event.
if (zooKeeperSaslClient != null) {
boolean sendAuthEvent = false;
if (zooKeeperSaslClient.getSaslState() == ZooKeeperSaslClient.SaslState.INITIAL) {
try {
zooKeeperSaslClient.initialize(ClientCnxn.this);
} catch (SaslException e) {
LOG.error("SASL authentication with Zookeeper Quorum member failed: " + e);
state = States.AUTH_FAILED;
sendAuthEvent = true;
}
}
KeeperState authState = zooKeeperSaslClient.getKeeperState();
if (authState != null) {
if (authState == KeeperState.AuthFailed) {
// An authentication error occurred during authentication with the Zookeeper Server.
state = States.AUTH_FAILED;
sendAuthEvent = true;
} else {
if (authState == KeeperState.SaslAuthenticated) {
sendAuthEvent = true;
}
}
}
if (sendAuthEvent) {
eventThread.queueEvent(new WatchedEvent(
Watcher.Event.EventType.None,
authState,null));
if (state == States.AUTH_FAILED) {
eventThread.queueEventOfDeath();
}
}
}
to = readTimeout - clientCnxnSocket.getIdleRecv();
} else {
to = connectTimeout - clientCnxnSocket.getIdleRecv();
}
// 报出sessionTimeout,然后换一台服务地址进行连接;
if (to <= 0) {
String warnInfo;
warnInfo = "Client session timed out, have not heard from server in "
+ clientCnxnSocket.getIdleRecv()
+ "ms"
+ " for sessionid 0x"
+ Long.toHexString(sessionId);
LOG.warn(warnInfo);
throw new SessionTimeoutException(warnInfo);
}
// 计算需要发送心跳
if (state.isConnected()) {
//1000(1 second) is to prevent race condition missing to send the second ping
//also make sure not to send too many pings when readTimeout is small
int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() -
((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
//send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
sendPing();
clientCnxnSocket.updateLastSend();
} else {
if (timeToNextPing < to) {
// to和timeToNextPing取最小值;
to = timeToNextPing;
}
}
}
// If we are in read-only mode, seek for read/write server
if (state == States.CONNECTEDREADONLY) {
long now = Time.currentElapsedTime();
int idlePingRwServer = (int) (now - lastPingRwServer);
if (idlePingRwServer >= pingRwTimeout) {
lastPingRwServer = now;
idlePingRwServer = 0;
pingRwTimeout =
Math.min(2*pingRwTimeout, maxPingRwTimeout);
pingRwServer();
}
to = Math.min(to, pingRwTimeout - idlePingRwServer);
}
// 这里将to传到了这里面;
clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
} catch (Throwable e) {
if (closing) {
if (LOG.isDebugEnabled()) {
// closing so this is expected
LOG.debug("An exception was thrown while closing send thread for session 0x"
+ Long.toHexString(getSessionId())
+ " : " + e.getMessage());
}
break;
} else {
// this is ugly, you have a better way speak up
if (e instanceof SessionExpiredException) {
LOG.info(e.getMessage() + ", closing socket connection");
} else if (e instanceof SessionTimeoutException) {
LOG.info(e.getMessage() + RETRY_CONN_MSG);
} else if (e instanceof EndOfStreamException) {
LOG.info(e.getMessage() + RETRY_CONN_MSG);
} else if (e instanceof RWServerFoundException) {
LOG.info(e.getMessage());
} else if (e instanceof SocketException) {
LOG.info("Socket error occurred: {}: {}", serverAddress, e.getMessage());
} else {
LOG.warn("Session 0x{} for server {}, unexpected error{}",
Long.toHexString(getSessionId()),
serverAddress,
RETRY_CONN_MSG,
e);
}
// At this point, there might still be new packets appended to outgoingQueue.
// they will be handled in next connection or cleared up if closed.
cleanAndNotifyState();
}
}
}
如果clientCnxnSocket还没有连接服务器,则时会取出一个服务地址进行连接;
private void startConnect(InetSocketAddress addr) throws IOException {
// initializing it for new connection
saslLoginFailed = false;
if(!isFirstConnect){
try {
Thread.sleep(r.nextInt(1000));
} catch (InterruptedException e) {
LOG.warn("Unexpected exception", e);
}
}
// 修改状态连接中;
state = States.CONNECTING;
String hostPort = addr.getHostString() + ":" + addr.getPort();
MDC.put("myid", hostPort);
setName(getName().replaceAll("\\(.*\\)", "(" + hostPort + ")"));
if (clientConfig.isSaslClientEnabled()) {
...
}
logStartConnect(addr);
clientCnxnSocket.connect(addr);
}
void connect(InetSocketAddress addr) throws IOException {
SocketChannel sock = createSock();
try {
registerAndConnect(sock, addr);
} catch (IOException e) {
LOG.error("Unable to open socket to " + addr);
sock.close();
throw e;
}
// 表示设置为false表示还未初始化完成;
initialized = false;
/*
* Reset incomingBuffer
*/
lenBuffer.clear();
incomingBuffer = lenBuffer;
}
// 创建socketChannel
SocketChannel createSock() throws IOException {
SocketChannel sock;
sock = SocketChannel.open();
sock.configureBlocking(false);
sock.socket().setSoLinger(false, -1);
sock.socket().setTcpNoDelay(true);
return sock;
}
// 注册selector并且连接
void registerAndConnect(SocketChannel sock, InetSocketAddress addr)
throws IOException {
// 这个key很重要,后面会通过修改这个key的监听事件去向服务端写数据;
// 这次只监听CONNECT事件;
sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
// 如果立即连接成功,则为true,一般情况此时返回false;
boolean immediateConnect = sock.connect(addr);
if (immediateConnect) {
sendThread.primeConnection();
}
}
clientCnxnSocket.doTransport负责处理NIO事件
void doTransport(int waitTimeOut, List<Packet> pendingQueue, ClientCnxn cnxn)
throws IOException, InterruptedException {
selector.select(waitTimeOut);
Set<SelectionKey> selected;
synchronized (this) {
selected = selector.selectedKeys();
}
// Everything below and until we get back to the select is
// non blocking, so time is effectively a constant. That is
// Why we just have to do this once, here
updateNow();
for (SelectionKey k : selected) {
SocketChannel sc = ((SocketChannel) k.channel());
// 与服务端成功连接时会触发此事件
if ((k.readyOps() & SelectionKey.OP_CONNECT) != 0) {
// 如果连接成功;
if (sc.finishConnect()) {
updateLastSendAndHeard();
updateSocketAddresses();
// 主要修改监听事件的类型:sockKey.interestOps(SelectionKey.OP_READ | SelectionKey.OP_WRITE);
sendThread.primeConnection();
}
// 读或写事件会触发;
// 这里有个好奇的点,客户端触写事件是怎么触发的?
// 上面修改了sockKey的监听类型包含了SelectionKey.OP_WRITE,客户端监听写事件会立即自动触发;
} else if ((k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0) {
doIO(pendingQueue, cnxn);
}
}
// 如果outgoingQueue中还有需要发送的Packet,则对sockKey注册的事件包含SelectionKey.OP_WRITE,继续向服务端发送写请求;
if (sendThread.getZkState().isConnected()) {
if (findSendablePacket(outgoingQueue,
sendThread.tunnelAuthInProgress()) != null) {
enableWrite();
}
}
selected.clear();
}
doIO代码:
void doIO(List<Packet> pendingQueue, ClientCnxn cnxn)
throws InterruptedException, IOException {
SocketChannel sock = (SocketChannel) sockKey.channel();
if (sock == null) {
throw new IOException("Socket is null!");
}
if (sockKey.isReadable()) {
int rc = sock.read(incomingBuffer);
if (rc < 0) {
throw new EndOfStreamException(
"Unable to read additional data from server sessionid 0x"
+ Long.toHexString(sessionId)
+ ", likely server has closed socket");
}
if (!incomingBuffer.hasRemaining()) {
incomingBuffer.flip();
if (incomingBuffer == lenBuffer) {
recvCount.getAndIncrement();
readLength();
} else if (!initialized) {
// 会设置 state.Conneced(RW)状态;第一次初始化会走这里;
// 读取连接的返回结果;
readConnectResult();
// sockKey事件里面不包含读时,把读加上
enableRead();
// 如果需要权限则先发送权限控制的请求,之后再发送正常的数据包;
if (findSendablePacket(outgoingQueue,
sendThread.tunnelAuthInProgress()) != null) {
enableWrite();
}
lenBuffer.clear();
incomingBuffer = lenBuffer;
updateLastHeard();
initialized = true;
} else {
// 和服务端的正常数据包发送会走这个方法;
sendThread.readResponse(incomingBuffer);
lenBuffer.clear();
incomingBuffer = lenBuffer;
updateLastHeard();
}
}
}
if (sockKey.isWritable()) {
// 取出第一个需要发送的数据包,这里面并没有从outgoingQueue中删除
Packet p = findSendablePacket(outgoingQueue,
sendThread.tunnelAuthInProgress());
if (p != null) {
updateLastSend();
// If we already started writing p, p.bb will already exist
if (p.bb == null) {
if ((p.requestHeader != null) &&
(p.requestHeader.getType() != OpCode.ping) &&
(p.requestHeader.getType() != OpCode.auth)) {
p.requestHeader.setXid(cnxn.getXid());
}
p.createBB();
}
sock.write(p.bb);
// 如果没有可用空间的时候
if (!p.bb.hasRemaining()) {
sentCount.getAndIncrement();
// 多outgoingQueue队列中移除packet
outgoingQueue.removeFirstOccurrence(p);
if (p.requestHeader != null
&& p.requestHeader.getType() != OpCode.ping
&& p.requestHeader.getType() != OpCode.auth) {
synchronized (pendingQueue) {
// 将其添加到pendingQueue队列中
pendingQueue.add(p);
}
}
}
}
// 如果outgoingQueue为空,则关闭写操作
if (outgoingQueue.isEmpty()) {
// No more packets to send: turn off write interest flag.
// Will be turned on later by a later call to enableWrite(),
// from within ZooKeeperSaslClient (if client is configured
// to attempt SASL authentication), or in either doIO() or
// in doTransport() if not.
disableWrite();
} else if (!initialized && p != null && !p.bb.hasRemaining()) {
// On initial connection, write the complete connect request
// packet, but then disable further writes until after
// receiving a successful connection response. If the
// session is expired, then the server sends the expiration
// response and immediately closes its end of the socket. If
// the client is simultaneously writing on its end, then the
// TCP stack may choose to abort with RST, in which case the
// client would never receive the session expired event. See
// http://docs.oracle.com/javase/6/docs/technotes/guides/net/articles/connection_release.html
disableWrite();
} else {
// Just in case
enableWrite();
}
}
}
先来看一下读请求中的readResponse重要方法:
// 读取服务端响应内容
void readResponse(ByteBuffer incomingBuffer) throws IOException {
ByteBufferInputStream bbis = new ByteBufferInputStream(
incomingBuffer);
BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);
ReplyHeader replyHdr = new ReplyHeader();
// 反序列化出ReplyHeader对象
replyHdr.deserialize(bbia, "header");
....
// 收到响应后,从pendingQueue删除头部元素
Packet packet;
synchronized (pendingQueue) {
if (pendingQueue.size() == 0) {
throw new IOException("Nothing in the queue, but got "
+ replyHdr.getXid());
}
packet = pendingQueue.remove();
}
/*
* Since requests are processed in order, we better get a response
* to the first request!
*/
try {
// 判断请求的顺序和响应的顺序是否相同,如果不一样,则报错(报错后会进行重新连接)
if (packet.requestHeader.getXid() != replyHdr.getXid()) {
// 与服务器的连接已断开
packet.replyHeader.setErr(
KeeperException.Code.CONNECTIONLOSS.intValue());
throw new IOException("Xid out of order. Got Xid "
+ replyHdr.getXid() + " with err " +
+ replyHdr.getErr() +
" expected Xid "
+ packet.requestHeader.getXid()
+ " for a packet with details: "
+ packet );
}
packet.replyHeader.setXid(replyHdr.getXid());
packet.replyHeader.setErr(replyHdr.getErr());
packet.replyHeader.setZxid(replyHdr.getZxid());
if (replyHdr.getZxid() > 0) {
lastZxid = replyHdr.getZxid();
}
if (packet.response != null && replyHdr.getErr() == 0) {
packet.response.deserialize(bbia, "response");
}
if (LOG.isDebugEnabled()) {
LOG.debug("Reading reply sessionid:0x"
+ Long.toHexString(sessionId) + ", packet:: " + packet);
}
} finally {
finishPacket(packet);
}
}
无论是成功还是失败,最终都会调用finishPacket方法
protected void finishPacket(Packet p) {
int err = p.replyHeader.getErr();
...
// 如果有异步回调
if (p.cb == null) {
synchronized (p) {
// 正确的情况下,处理完成进行notifyAll,此时发起请求的客户端会收到通知,从wait处继续向下走
p.finished = true;
p.notifyAll();
}
} else {
// 如果有异步回调,则会加到waitingEvents队列中;
p.finished = true;
eventThread.queuePacket(p);
}
}
客户端发起的请求,会阻塞在这里,这里收到notifyAll之后,会继续向下走;
public ReplyHeader submitRequest(RequestHeader h, Record request,
Record response, WatchRegistration watchRegistration,
WatchDeregistration watchDeregistration)
throws InterruptedException {
ReplyHeader r = new ReplyHeader();
Packet packet = queuePacket(h, r, request, response, null, null, null,
null, watchRegistration, watchDeregistration);
synchronized (packet) {
if (requestTimeout > 0) {
// Wait for request completion with timeout
waitForPacketFinish(r, packet);
} else {
// Wait for request completion infinitely
while (!packet.finished) {
packet.wait();
}
}
}
if (r.getErr() == Code.REQUESTTIMEOUT.intValue()) {
sendThread.cleanAndNotifyState();
}
return r;
}
这个方法返回之后,到了这里:
ReplyHeader r = cnxn.submitRequest(h, record, response, null);
// 如果err不为0,则返回错误;
if (r.getErr() != 0) {
throw KeeperException.create(KeeperException.Code.get(r.getErr()),
clientPath);
}
if (stat != null) {
DataTree.copyStat(response.getStat(), stat);
}
if (cnxn.chrootPath == null) {
return response.getPath();
} else {
return response.getPath().substring(cnxn.chrootPath.length());
}
最后,有关session过期、超时、xid顺序报错等会调用这个方法进行清除资源,重新连接:
private void cleanAndNotifyState() {
// 会清除两个队列中的数据;
cleanup();
if (state.isAlive()) {
// 发出一个关闭的事件
eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
Event.KeeperState.Disconnected, null));
}
clientCnxnSocket.updateNow();
clientCnxnSocket.updateLastSendAndHeard();
}
2.3 客户端是如何保证发送顺序的;
客户端的发送消息首先会被封装成Packet,然后将Packet放到outgoingQueue队列,顺序消费进行发送到服务端,在发送服务端之前,会对每个Packet打上xid(每个客户端的递增序号),发送到服务端的Packet会再将放到pendingQueue等待响应;
客户端收到服务端的响应内容之后,首先是进行反序列化成ReplyHeader,然后从pendingQueue头部弹出第一个元素,比较packet.requestHeader.getXid() != replyHdr.getXid()
如果一致,如果不一致说明乱序了,则抛出IOException,进行notifyAll通知客户端请求响应错误结果,然后zk客户端会再选择下一个服务地址列表进行连接;
2.4 有关session超时;
有客户端与服务端通信过程中会不断的维护session超时时间,在上面分析也提到了比如xid顺序会导出客户端服务重连,重连有2种情况:
- 在session过期内重连成功:
session timeout时间内重连成功后,会改变状态为connected:
KeeperState eventState = (isRO) ?
KeeperState.ConnectedReadOnly : KeeperState.SyncConnected;
eventThread.queueEvent(new WatchedEvent(
Watcher.Event.EventType.None,
eventState, null));
- 在session过期之后重连成功:
在session过期之后,连接上了,也会收到一个session过期的通知:
negotiatedSessionTimeout = _negotiatedSessionTimeout;
if (negotiatedSessionTimeout <= 0) {
state = States.CLOSED;
eventThread.queueEvent(new WatchedEvent(
Watcher.Event.EventType.None,
Watcher.Event.KeeperState.Expired, null));
eventThread.queueEventOfDeath();
String warnInfo;
warnInfo = "Unable to reconnect to ZooKeeper service, session 0x"
+ Long.toHexString(sessionId) + " has expired";
LOG.warn(warnInfo);
throw new SessionExpiredException(warnInfo);
}
2.5 zookeeper异步创建节点;
zookeeper原生API支持异步创建节点
zookeeper.create("/zk-test", "".getBytes(),null, CreateMode.EPHEMERAL,new AsyncCallback.StringCallback(
public void processResult(int rc, String path, Object ctx, String name) {
System.out.println("Create path result: [" + rc + ", " + path + ", " + ctx + ", real path name: " + name);
}
), "I am context.");
可以看到,这个create源码中,仅仅是把请求封装Packet添加到outgoingQueue队列之后就返回了,并没有wait,所以结果是在回调接口中进行的,我们再看一下回调是如何做的:
// 如果有异步回调
if (p.cb == null) {
synchronized (p) {
// 正确的情况下,处理完成进行notifyAll,此时发起请求的客户端会收到通知,从wait处继续向下走
p.finished = true;
p.notifyAll();
}
} else {
// 如果有异步回调,则会加到waitingEvents队列中;waitingEvents里面去调用AsyncCallback cb
p.finished = true;
eventThread.queuePacket(p);
}
2.4 EventThread;
在上面也提到了很多的事件,这小节来研究一下zk的事件通知;EventThread也是一个Thread,直接看它的run方法即可;
对于run方法,直接看一下processEvent方法内容即可,里面做了一些判断,如果是监听事件,最后会调用监听事件的watcher.process(pair.event);
如果是异步回调,则执行cb.processResult;
EventThread的逻辑还是相对简单;