Apigee proxy latency or overhead

ven
Bronze 1
Bronze 1

We have bunch of apigee proxies running in Hybrid environment. In our performance test we noticed that the proxy latency / overhead is around 300 milliseconds. The proxy latency shot up steadily as the performance test ramped up users/hit rate and then the latency started to slowly come down when the test ramped down the load. I am printing the time taken by all policies to see where the proxy overhead is. But all these policies are finishing under 10 ms (sum of the time taken by all policies that are part of pre and post proxy flows). I am wondering how do I know where is the remaining 290 milliseconds going. I am pretty sure that this time is taken in the gateway as I can see backend response time is not counted in this 300ms.

0 7 636
7 REPLIES 7

Very hard to tell based on above information & may be slow backend  or check message processor (high processing time, may be experiencing high CPU or memory usage)..may be thread dump/heap dumps will help identify/tune it.

Also check each phases in trace (ref - https://docs.apigee.com/api-platform/debug/using-trace-tool-0) or download the trace to check each phases.Sometimes there could be some opportunity to cache/tune custom code...If you share more information may be can provide inputs but without much insights it will little bit harder..

Good luck.

Thanks for the response. I think trace will be more appropriate if we get consistent errors or slow response time. One thing I should have mentioned earlier (my apologies, I somehow missed stating this point in my question. I will edit the question after this post, if possible) was the proxy latencies steadily shot up as the performance test was ramping up the load and then started to decline as the tests ramped down. I am wondering if there is any way we can see what happened, through the logs by picking some message ids of the requests which took longer to finish. I will enable trace when the load test runs again, but wondering how to troubleshoot these when this happens in PROD. We cannot ask for trace on APIs when this happens, does Apigee provide any other way to get insight into whats happening in a request based on message id or any other identifier, without having to turn trace on? 

Have you also separately run a performance test directly against your backend services first? 

Depending on the policies you're using within Apigee, backend performance, payload sizes, the performance will vary and you may need to increase the number of runtime pods as well - or have the pods autoscale as required (assuming the runtime pods are the bottleneck). 

Yes, we have run the backend performance testing and they all look good in terms of performance. I agree that payload sizes and policies (we have quite a few policies being triggered as part of pre proxy flow and post proxy flow) can impact the proxy latency. I am looking for a way to find where it is taking time as summing up the time taken by policies is not matching with the overall proxy latency

I meant to say, other than enabling trace (as it captures extremely few requests and those requests may or may not run into this latency issue as this is steadily growing). As I mentioned in another reply, I will enable trace and see if it helps us to identify where the issue is.

I would double the configuration and results of your backend performance test first to confirm the test executed as expected. There are commonly used test tools that may not run a test as expected.

ven
Bronze 1
Bronze 1

I want to provide an update. we are still investigating the issue. we have few errors due to db contention, working on that first and then will re run the tests to see the actual performance. I will keep this thread updates once we have concrete findings.