Edge private cloud connectivity checks

Is there any common tool, to check basic connectivity/sanity check, to validate, if all components can talk to each other? Sort of like a 1-click/1-command environment validator or a health check.

Or even a shell script, of the below sort:

apigee-all healthcheck (on the management server, but checks across the entire planet)


We are already aware of the management APIs like the below:

http://{{ms_host}}:{{ms_port}}/v1/servers?pod=[analytics|gateway|central]

If not, any tips on what tools, commands could be used to build out a shell script to perform. This could be useful baked into our apigee installation custom script as a post-install task.

Thanks,

Girish

Solved Solved
0 5 309
1 ACCEPTED SOLUTION

rmishra
Participant V

Apigee bundles in a Apigee Validation Shell utility which has the capability to validate your Apigee infrastructure

https://docs.apigee.com/private-cloud/v4.17.05/test-install

I am not sure if it touches the analytics pod but you can validate the health of the analytics pod by making some curl calls(same things you originally included).

Another common pattern (lower latency, more real time) is to use HeartBeat URL/Health URL's . This can be used to check the health of the components in the gateway pod.

View solution in original post

5 REPLIES 5

rmishra
Participant V

Apigee bundles in a Apigee Validation Shell utility which has the capability to validate your Apigee infrastructure

https://docs.apigee.com/private-cloud/v4.17.05/test-install

I am not sure if it touches the analytics pod but you can validate the health of the analytics pod by making some curl calls(same things you originally included).

Another common pattern (lower latency, more real time) is to use HeartBeat URL/Health URL's . This can be used to check the health of the components in the gateway pod.

Thanks @rmishra.

We are aware of the Apigee validate utility. I believe, it creates an org called VALIDATE, test env and couple of proxies. For this specific instance, I know there is still a connectivity issue being worked out, but in general, would help if there was a utility, that could perform this end to end validation and/or reachability test.


As we add more data centers and nodes, it will only get more complex. For example, if we add 2 more nodes, how do we know, whether they are participating in the runtime? A central utility/tool against which I could query about all the participating nodes.

Regards,

Girish

It seems what you are trying to describe is largely the responsibility of a monitoring platform

Ensure your each of your apigee nodes is being monitored at a process level, over HTTP and JMX (as applicable), proprietary checks(e.g. cassandra nodetool) . Configure your monitoring system to Issue alerts when the system become unavailable

If you look at the port requirements among apigee components, you will realize that "Reachability" is a complex validation - Should we write rules for every single access point across every pair of components? IMHO, It's going to be painful and not very beneficial

Deep availability tests (across multiple protocols) should be good enough. You could write reachability tests (via heartbeat pings) for your gateway pod components to ensure that you get immediate alerts on live traffic disruption.

Thanks again @rmishra

Yes, makes sense, a lot of moving parts and interactions with both Apigee and non-Apigee components involved (C*, PG, nginx, etc). Not as simple as I had hoped.

Though, sorry to be a bother, but don't all the components register themselves with ZooKeeper? Can we query ZK paths and identity the components, not yet registered to it? Apologies, if my understanding in this is naive. Just recently got into on-prem stuff.

Regards,

Girish

No bother, i love these forums, great place to understand other's approaches/perspectives and validate my own.

Yes all components are registered with zookeeper. And you can query it using Zookeeper CLI or API's. But that just tells you about the list of components registered with zookeeper (the topology of your planet).

As far as i know it doesn't tell you if the component is currently suffering an outage/is unavailable. Even if there is code implemented inside Apigee to say "Deregister self with Zookeeper while being shut down", there is no guarantee that the code will be executed in all scenarios (e.g. kernel panic). An independent monitoring system is your best bet for these answers