Large chunk of time being spent on "Proxy Request ...

rymonroe · 03-28-2019 04:41 PM

We are seeing a large portion of time (over 1000ms) being spent on "Proxy Request Flow Started" in trace sessions. What is Apigee doing during this time and what can be done to shorten this latency? Thanks.

dchiesa1

Apigee is receiving the request.

It's possible that the request is taking a really long time because the client is slow. For example if the client opens the connection, asserts a content-length, and then sends 40% of the content, before delaying.... Apigee Edge will experience a delay if that happens.

Another possibility is that the Apigee Edge system is overburdened and there is contention for cpu or memory or io on the message processor. This can happen in the case of the "free trial" Apigee Edge instances, but it generally does not happen with commercially licensed instances. if you have a commercially licensed Apigee Edge , then you should contact Apigee Support regarding this delay.

A final possibility, really an edge case in the above, is that the MP is overloaded because it is waiting on the backend system. If you have targets that exhibit 50s latency, or target systems that send 10mb payloads, then high load will cause the MP to be slower to even accept new inbound connections. A slow backend can result in slowness on the frontend.

if you'd like to diagnose or troubleshoot this matter yourself, I suggest testing various scenarios.

does it happen with every API call? Only with POST?
Does it happen only under load, in other words when you have lots of other calls going through your system?
Does it happen only with large payloads?
etc

You may be able to narrow things down when you test these various scenarios.

rymonroe

Thanks for the detailed response, here are some of my thoughts.

I don't think this is a client issue. Before this request makes it to Apigee it passes through a firewll with terminates TLS and inspects packets. This kind of delat should be seen their and not in Apigee.

I don't think this is a router/message processor that's overburdened or a contention for CPU/Memory. If this was the case I would expect to see some random spikes on other proxies which I don't see.

I'm not sure if it could be related to size of payloads or only POST/GET etc. I've only captured a few of these and will need to do some more analysis.

rymonroe

After doing some tests I wasn't able to associate these calls with verb POST/GET/etc or correlate high load or large payloads. I thought this could be a result of putting proxy in trace mode so I performed the following test.

Using 3 completely separate large private large 4.18.05 clusters I created identical proxies on all three with same basepath. A loadbalancer was in front of Apigee Routers round robining requests over all Routers. I created a very simple call that would return 404 within 10ms and put proxy from 1 of the three clusters in Trace and constantly hit loadbalacer. There was a slow call about 5% of the time and always from the proxy in trace mode.

I did similar tests with different proxies all with the same result. Interestingly when trying this in a very small cluster (2-3 Message Processors) I did not see any slowness.

Large chunk of time being spent on "Proxy Request Flow Started"