Uptime v Downtime % Statistics from Management API

sdonnelly
Participant I

I am pulling together a report to display a number of different metrics: API proxy average response times per proxy, error count for a list of proxies and up-time v downtime of all proxies.

I have been able to call the stats management API for response times and error count but not up-time v downtime of all proxies.

I do have an API monitoring alert set-up to report when five or more 5xx errors occur during a 5 minute period.

Does anyone else gather these statistics from management APIs, or does anyone have any suggestions?

0 1 633
1 REPLY 1

Hmm, I don't believe "uptime" is a thing you can quantify directly.

To determine "uptime", you'd need to invoke the API proxy, and also invoke the upstream, and then compare the results.

The applicable truth table :

proxy responseupstream responseconclusion
successsuccessboth up
failsuccessproxy down
successfailmis-configured
failfailnetwork access is down

You could implement this with an external healthcheck service. But this kind of information is not available in the Mgmt API (/stats).

Apigee itself tracks this information, and for commercial plans, periodically (monthly? I think) delivers to each customer an availability metric, and you can see it yourself on the support portal I believe.

In general the Apigee Service is always "up" though in rare cases proxies don't respond correctly. We make guarantees of three 9's or four 9's of availability. It's not measured by "time available", but by request. We have observability that allows us, For each request, to answer Has the request been properly served? And then we aggregate that over your thousands or millions or billions of requests and calculate what % of requests have not been served correctly. That is the availability metric as defined by Apigee in the service agreement.

In the end I think maybe your efforts to measure uptime via the /stats API may be not fruitful . But an external healthcheck might be interesting to you.