Is there any common tool, to check basic connectivity/sanity check, to validate, if all components can talk to each other? Sort of like a 1-click/1-command environment validator or a health check.
Or even a shell script, of the below sort:
apigee-all healthcheck (on the management server, but checks across the entire planet)
We are already aware of the management APIs like the below:
http://{{ms_host}}:{{ms_port}}/v1/servers?pod=[analytics|gateway|central]
If not, any tips on what tools, commands could be used to build out a shell script to perform. This could be useful baked into our apigee installation custom script as a post-install task.
Thanks,
Girish
Answer by Rahul M · Apr 25, 2018 at 02:18 PM
Apigee bundles in a Apigee Validation Shell utility which has the capability to validate your Apigee infrastructure
https://docs.apigee.com/private-cloud/v4.17.05/test-install
I am not sure if it touches the analytics pod but you can validate the health of the analytics pod by making some curl calls(same things you originally included).
Another common pattern (lower latency, more real time) is to use HeartBeat URL/Health URL's . This can be used to check the health of the components in the gateway pod.
Thanks @rmishra.
We are aware of the Apigee validate utility. I believe, it creates an org called VALIDATE, test env and couple of proxies. For this specific instance, I know there is still a connectivity issue being worked out, but in general, would help if there was a utility, that could perform this end to end validation and/or reachability test.
As we add more data centers and nodes, it will only get more complex. For example, if we add 2 more nodes, how do we know, whether they are participating in the runtime? A central utility/tool against which I could query about all the participating nodes.
Regards,
Girish
It seems what you are trying to describe is largely the responsibility of a monitoring platform
Ensure your each of your apigee nodes is being monitored at a process level, over HTTP and JMX (as applicable), proprietary checks(e.g. cassandra nodetool) . Configure your monitoring system to Issue alerts when the system become unavailable
If you look at the port requirements among apigee components, you will realize that "Reachability" is a complex validation - Should we write rules for every single access point across every pair of components? IMHO, It's going to be painful and not very beneficial
Deep availability tests (across multiple protocols) should be good enough. You could write reachability tests (via heartbeat pings) for your gateway pod components to ensure that you get immediate alerts on live traffic disruption.
Thanks again @rmishra
Yes, makes sense, a lot of moving parts and interactions with both Apigee and non-Apigee components involved (C*, PG, nginx, etc). Not as simple as I had hoped.
Though, sorry to be a bother, but don't all the components register themselves with ZooKeeper? Can we query ZK paths and identity the components, not yet registered to it? Apologies, if my understanding in this is naive. Just recently got into on-prem stuff.
Regards,
Girish
What tools are recommended to automate Cloud Ops? Any scripts / tools / best practices? 6 Answers
How do I update an existing VirtualHost with updated configuration? 1 Answer
Can I update an existing virtual host without having to delete/recreate it? 2 Answers
4.15.04.03 patch no analytics data 2 Answers
Apigee Edge OnPremises installation order for routers & mp's ? 1 Answer