Playbook: Troubleshooting Management Server down issue due to ZK connectivity.

0 0 411

First check the management server log

CuratorFramework-0-EventThread ERROR o.a.c.ConnectionState - ConnectionState.checkState() : Authentication failed  main INFO ZOOKEEPER - ZooKeeperServiceImpl.exists() : Retry path existence path:/featureflag, reason: KeeperErrorCode = ConnectionLoss for /featureflag 
CuratorFramework-0 WARN o.a.c.ConnectionState - ConnectionState.checkTimeouts(): 

From the above log we can figure out that the ZK nodes are unhealthy so management server can't connect to zookeeper.

Then check the zookeeper log :-

Exception causing close of session 0x0 due to java.io.IOException:ZooKeeperServernot running [myid:3]- INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001]-Closed socket connection for client <Ip>:44486(no session established for client)

If ZK are reporting errors with each other like above,then likely leader node is down and they have not elected a leader

To quickly check that the ZooKeeper software is running, follow these steps:

1)Login to each ZooKeeper machine and run the command:

echo ruok | nc `hostname -i` 2181 or run echo srvr | nc `hostname -i` 2181


2)Confirm that you get the following response from each ZooKeeper instance:

imok 

Note:If you get no response,or a ‘broken pipe’ error,zookeeper instance not serving request or ZooKeeper is not running

3)Obtain more information about the status of Zookeeperby logging into each ZooKeeper machine and running the command:

echo status | nc `hostname -i` 2181

Check the conf_zookeeper_connection.string on Management server, Message-processor and router to validate the ZK connectivity.

/opt/apigee/token/application/message-processor.properties, router.properties:
/opt/apigee/customer/application/management-server.properties:

Kindly mention the Zk node in string pool like in below order and if the leader node has the problem/ not in service then try to stop the node and re-elect the leader from the existing node. Note the leader node or the first node in the connection.string should always be working.

conf_zookeeper_connection.string=<leader-hostname>:2181,<follower-hostname>:2181,<follower-hostname>:2181{notin quorum,not serving requests}
Version history
Last update:
‎03-06-2017 11:11 PM
Updated by: