Monitoring your API's health is key to maintaining a trusted, reliable, and robust API program, and to quickly identifying and resolving issues. You can monitor both the proxy and underlying target endpoints.
When designing your API, consider how to monitor in a lightweight and maintainable fashion. Also, think about what role the API may take in monitoring underlying target health.
This article outlines the approach that the Customer Success team at Apigee takes when helping customers form a monitoring strategy.
Ask yourself the following questions when you start to think about API Monitoring:
The main objectives of a monitoring strategy are:
A general best practice consists of:
The following are some examples of resources we commonly use to fulfill the above requirements. The patterns described in this article are:
This is a specialised sub resource exposed by the proxy to test proxy network connectivity and proxy deployment status. The proxy does not hit any target APIs in this scenario.
Although it could be implemented as first-class resource, it is recommended to implement at as a sub resource. So, each API Proxy bundle is instrumented by providing independent monitoring capabilities.
Here is an example implementation:
Example Request
GET /customer/v1/ping Accept: application/json
Example Response
HTTP/1.1 200 OK Content-Type: application/json { "environment": "prod", "clientIp": "100.10.1.0", "api": "customer-v1", "verb": "GET", "responseTime": 20, "message": "pong" }
This is a specialised resource to test proxy-to-target network connectivity and assess target API health. It is exposed by both the proxy and target APIs, as follows:
/status
endpoint.Status endpoints for target APIs and components will need to do all internal testing necessary to report the health of that component.
Example request
GET /customer/v1/status Accept: application/json
Example success response
HTTP/1.1 200 OK Content-Type: application/json [ { "name": "customer-v1", "component" : "crm", "targetResponseTime": 350, "status": "ok", "response": "" }, { "name": "customer-v1", "component" : "loyalty", "targetResponseTime": 500, "status": "ok", "response": "" } ]
Example failure response
HTTP/1.1 500 Internal Server Error Content-Type: application/json [ { "name": "customer-v1", "component" : "crm", "targetResponseTime": 600, "status": "failure", "response": "unable to connect to customer database" }, { "name": "customer-v1", "component" : "loyalty", "targetResponseTime": 500, "status": "ok", "response": "" } ]
While implementing this resource, you'll learn the quickest and cheapest route to understanding how each target system's health can be checked.
db.serverStatus()
command that returns quickly and does not impact MongoDB performance. The proxy /status
endpoint can execute db.serverStatus()
on mongo to report its status. This approach uses the existing API resources to check the health of the system. Because the tests are running on a production environment, be careful when choosing resources for this. Ideally data that is used by this resource will be isolated from all other system data. For example, in hotel API a new dummy hotel can be created within the system where monitoring can do reservations and cancellations without affecting real hotel availability.
If you are using real requests for monitoring, and if APIs are protected by API keys or OAuth, create a new separate application for monitoring. That way, requests can be identified in analytics.
Regardless of the monitoring approach you take, the requests will still appear in any analytics report so you may want to consider adding something in the requests to be able to easily filter them out of any reporting.
There are a number of tools out there to help you monitor your API. Here's some of the tools we have used:
Think about what you're trying to monitor and why. Think about the cost of monitoring. Don't forget about the security of the resources you are exposing.
Are you looking to monitor both proxy and target health? Differentiating between proxy health vs target health can be key when diagnosing issues production.
should be
Are you looking to monitor both proxy and target health? Differentiating between proxy health vs target health can be key when diagnosing issues in production.
to define various request/response patterns that touch as many components as possible to test the health of the overall system running in a production environment
should be to define various request/response patterns that touch as many components as possible to test the health of the overall system reason wouldn’t limit monitoring strategies based on environment. It could be just as important to monitor alpha, beta, and dev integration environments
designing various specialised cheap-to-execute requests that monitor the health of target components and connectivity between the proxy and the target services
designing various specialised cheap-to-execute requests that monitor the health of target components and connectivity between the proxy and the target endpoint
Client request hit a proxy /status endpoint
Client request hits a proxy /status endpoint
If all targets respond with success, Apigee responds with 200 OK with an array of JSON objects containing health and timing information for each target system.
If all targets respond with success, Apigee responds with 200 OK with an array of objects containing health and timing information for each target system.
@Dom Couldwell, ping @docs here when you're finished with Steve's comments. Thanks!
Thanks for posting Dom. My org is going through this process now. we are looking at these tools and integrating with legacy tools we have in place like servicenow and zendesk.
@Ben Rodriguez - I'd also strongly recommend looking at StackDriver. For Health/Uptime Checks https://cloud.google.com/monitoring/alerts/uptime-checks.