Solved: Management API timeout setting 504 Gateway Timeout

chungss · 04-01-2020 12:19 AM

We have monthly recurring Linux shell script which calls a Management API to retrieve analytics data. We noticed that lately the query returns error 504 Gateway Timeout after 5:00 minutes.

Is there a timeout settings that we can set for the Management API?

We are on private/on-premise Edge 4.19.06.00.

Side note: We did not have this issue when we were on 4.18.05 previously.

chungss

I may have found the root cause of the issue. It was due to the network proxy. I executed the curl again without the proxy and the query executed as expected, even if it took longer than 5 minutes.

curl --noproxy "*" -u {username}:{password} http://{management-server}/v1/o/{org}/environments/{env}/stats/request_verb,proxy_basepath,proxy_pathsuffix?"select=sum(message_count),min(total_response_time),max(total_response_time),avg(total_response_time),tps&timeRange=02/29/2020+16:00~03/31/2020+16:00&limit=100&sortby=sum(message_count)"

Marking this as solved.

View solution in original post

deniska

Interesting question;

Only thing that I can see in cwc (code with config) is this variable, that holds 300000 ms ~ 5min.

/opt/apigee/apigee-service/bin/apigee-service edge-management-server configure -search conf_analytics_dp.ingester.newTimeSeriesInterval

Found key conf_analytics_dp.ingester.newTimeSeriesInterval, with value, 300000, in /opt/apigee/edge-management-server/token/default.properties

There are also more keys for analytics in Management Server with the same timeout:

Line 1151: conf_pg-agent_aggregation.interval=300000 
Line 1160: conf_pg-agent_custom.aggregation.interval=300000 
Line 1164: conf_pg-agent_custom.aggregation.initializer.interval=300000 
Line 1201: conf_pg-agent_vdim.useragent.aggregation.interval.millis=300000 
Line 1206: conf_pg-agent_vdim.geo.aggregation.interval.millis=300000 
Line 1212: conf_pg-agent_vdim.timeofday.aggregation.interval.millis=300000 
Line 1215: conf_pg-agent_percentile.aggregation.interval.millis=300000 
Line 1223: conf_pg-agent_target.aggregation.interval.millis=300000 
Line 1225: conf_pg-agent_cache.aggregation.interval.millis=300000

Try to play with these, with help of cwc: https://docs.apigee.com/private-cloud/v4.19.06/how-configure-edge

Before that, please consult Apigee support, and update us once you have found the right configuration

@ylesyuk

chungss

Thank you for the feedback. I will try to reach out to the support team.

ylesyuk

it could be a problem with dns resolution

chungss

Hi @ylesyuk, can you elaborate more on this?

ylesyuk

My analysis/suggestion is based on the fact that you did a due diligence and looked around at MS log and found nothing there related to those 5XXs.

I also assume that your MS endpoint functions elsewhere, and fails only at the node(s) where your job/script is executing. othewise, you would say that MS does not work at all.

if that is correct, the the problem points to infrastructure changes. a typical dns timeout is around 2-10 secs, but I had a couple of situations in the past when this kind of behaviour was traced to the DNS along the chain invocation and fqdn resolution (routing)

there is an easy way to check this: from a box that calls MS, use curl without and with --resolve to see if you can reproduce the problem. Again, dns propagation might happen at random times and therefore, it might work now. you need to correllate script execution fail times with your networking people.

again, this is just a hypothesis. but the one that is worth to persue.

chungss

Thanks for the very detailed explanation. Your comment gave me an idea and I tried executing the curl without proxy --noproxy and, it works!

Something must have changed to the server settings or the proxy server that I'm not aware of.

chungss

I may have found the root cause of the issue. It was due to the network proxy. I executed the curl again without the proxy and the query executed as expected, even if it took longer than 5 minutes.

curl --noproxy "*" -u {username}:{password} http://{management-server}/v1/o/{org}/environments/{env}/stats/request_verb,proxy_basepath,proxy_pathsuffix?"select=sum(message_count),min(total_response_time),max(total_response_time),avg(total_response_time),tps&timeRange=02/29/2020+16:00~03/31/2020+16:00&limit=100&sortby=sum(message_count)"

Marking this as solved.