How Apigee Edge supports SLO, SLI, SLA to measure the availability of API Program ?

What kind of reports / metrics available to track API Program SLO, SLI, SLA in Apigee Edge out of the box ? How does Apigee Edge platform provides visibility in terms of SLO, SLI, SLA for API Program ?

Solved Solved
1 3 4,956
1 ACCEPTED SOLUTION

Let me start by defining what does these terms mean,

SLA :

A service-level agreement is an agreement between two or more parties, where one is the customer and the others are service providers. This can be a legally binding formal or an informal "contract".

Most of the time having just an SLA might not be good enough, For example let's say there is an API called http://mocktarget.apigee.net/json that provides mock json response. SLA might be defined as how many times you get 200 HTTP response back when you call this API based on percentage like 99.9% or 99.99%.

Let's say on an average system is available 99.5% last week & SLA is 99%. Let's say on a particular day 1000 requests came in out of that 995 succeeded with 200 & 5 of them failed with 500 response code. 5 users didn't get the best experience which resulted in bunch of support tickets & escalations. Is SLA good enough in this case ? Answer might be no.

Similarly when it comes to APIs, APIs might be down due to the Apigee API Management layer or Target errors. Just having an SLA might not be good enough to analyze what went wrong during disasters. You need more than SLA's.

Service-level objectives (SLO)

A service level objective (SLO) is a key element of a service level agreement (SLA) between a service provider and a customer. SLOs are agreed as a means of measuring the performance of the Service Provider and are outlined as a way of avoiding disputes between the two parties based on misunderstanding.

You need more SLO's & ability to tweak these SLO's based on changing scenarios to measure SLA's & avoid conflicts between different functions. In above example, Is it APIGEE layer or Target errors which resulted in 99.5% availability can be answered if we have defined SLO's that says for example Apigee Proxy errors (SLI's - Service level Indicator) are 0 & Target errors are 5. Apigee Edge Proxy Success Rate SLO can be defined as 99.99%. Similarly, You should have SLOs such as such as response time proxy, response time target, 4xx, 5xx , Latencies & more that you can define based on your requirement when it comes to APIs.

Apigee Edge Analytics provides you all these factors to better understand your system availability,

For Example, Apigee Edge Analytics provides out of the box dashboards to measure SLO's like,

  • Response Time - Median, 95th Percentile, 99th Percentile
  • Target Response Time
  • Proxy Response Time
  • Request Processing Latency
  • Response Processing Latency
  • Average Response Time
  • Proxy Errors
  • Target Errors
  • Errors by Error code
  • Cache Hit Rate
  • Cache Response Time & Many more

You can create your own SLI & Measure SLO using custom reports when it comes to APIs using Statistics Collector Policy in Apigee Edge for various scenarios.

See an article here that explains, Why have an SLO at all? . So, What are your SLOs for your API Program & What SLOs are you using in Apigee Edge Analytics ?

View solution in original post

3 REPLIES 3

Let me start by defining what does these terms mean,

SLA :

A service-level agreement is an agreement between two or more parties, where one is the customer and the others are service providers. This can be a legally binding formal or an informal "contract".

Most of the time having just an SLA might not be good enough, For example let's say there is an API called http://mocktarget.apigee.net/json that provides mock json response. SLA might be defined as how many times you get 200 HTTP response back when you call this API based on percentage like 99.9% or 99.99%.

Let's say on an average system is available 99.5% last week & SLA is 99%. Let's say on a particular day 1000 requests came in out of that 995 succeeded with 200 & 5 of them failed with 500 response code. 5 users didn't get the best experience which resulted in bunch of support tickets & escalations. Is SLA good enough in this case ? Answer might be no.

Similarly when it comes to APIs, APIs might be down due to the Apigee API Management layer or Target errors. Just having an SLA might not be good enough to analyze what went wrong during disasters. You need more than SLA's.

Service-level objectives (SLO)

A service level objective (SLO) is a key element of a service level agreement (SLA) between a service provider and a customer. SLOs are agreed as a means of measuring the performance of the Service Provider and are outlined as a way of avoiding disputes between the two parties based on misunderstanding.

You need more SLO's & ability to tweak these SLO's based on changing scenarios to measure SLA's & avoid conflicts between different functions. In above example, Is it APIGEE layer or Target errors which resulted in 99.5% availability can be answered if we have defined SLO's that says for example Apigee Proxy errors (SLI's - Service level Indicator) are 0 & Target errors are 5. Apigee Edge Proxy Success Rate SLO can be defined as 99.99%. Similarly, You should have SLOs such as such as response time proxy, response time target, 4xx, 5xx , Latencies & more that you can define based on your requirement when it comes to APIs.

Apigee Edge Analytics provides you all these factors to better understand your system availability,

For Example, Apigee Edge Analytics provides out of the box dashboards to measure SLO's like,

  • Response Time - Median, 95th Percentile, 99th Percentile
  • Target Response Time
  • Proxy Response Time
  • Request Processing Latency
  • Response Processing Latency
  • Average Response Time
  • Proxy Errors
  • Target Errors
  • Errors by Error code
  • Cache Hit Rate
  • Cache Response Time & Many more

You can create your own SLI & Measure SLO using custom reports when it comes to APIs using Statistics Collector Policy in Apigee Edge for various scenarios.

See an article here that explains, Why have an SLO at all? . So, What are your SLOs for your API Program & What SLOs are you using in Apigee Edge Analytics ?

GOOOOD answer, Anil!

Thank you @Dino !!