"RPCException: Call timed out" when deploying proxy

Not applicable

We've been seeing errors when deploying to OPDK. The Edge GUI shows:

Error in deployment for environment dev.
The revision is deployed, but traffic cannot flow. Call timed out; either serveris down or server is not reachable 

The management-server's logs show:

2015-11-17 16:42:35,108 org:MYORG env:dev qtp1328246543-16010 ERROR DISTRIBUTION - RemoteServicesDeploymentHandler.deployToServers() : RemoteServicesDeploymentHandler.deployT
oServers : Deployment exception for server with uuid 6309514d-bb75-4729-b48d-4ae55571c539 : cause = Call timed out
com.apigee.rpc.RPCException: Call timed out
        at com.apigee.rpc.impl.AbstractCallerImpl.handleTimeout(AbstractCallerImpl.java:64) ~[rpc-1.0.0.jar:na]
        at com.apigee.rpc.impl.RPCMachineImpl$OutgoingCall.handleTimeout(RPCMachineImpl.java:483) ~[rpc-1.0.0.jar:na]
        at com.apigee.rpc.impl.RPCMachineImpl$OutgoingCall.access$000(RPCMachineImpl.java:402) ~[rpc-1.0.0.jar:na]
        at com.apigee.rpc.impl.RPCMachineImpl$OutgoingCall$1.run(RPCMachineImpl.java:437) ~[rpc-1.0.0.jar:na]
        at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:532) ~[netty-all-4.0.0.CR1.jar:na]
        at io.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:430) ~[netty-all-4.0.0.CR1.jar:na]
        at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:371) ~[netty-all-4.0.0.CR1.jar:na]
        at java.lang.Thread.run(Thread.java:744) ~[na:1.7.0_51]
2015-11-17 16:46:23,997 org:MYORG env: qtp1328246543-16014 ERROR REST - CustomJAXRSInvoker.performInvocation() : CustomJAXRSInvoker.performInvocation : Method com.apigee.distribution.DeploymentAPI.getAppDeploymentStatus threw an exception.

When POSTing the proxy bundle zip, we're seeing a "500: read timeout".

Oddly, the deployment seems to succeed despite these errors. Is there anything we should look at? The debug logs for the management server were overwhelming and I didn't spot anything.

Solved Solved
1 9 937
1 ACCEPTED SOLUTION

Not applicable

Restarting Edge fixed this issue. We're not sure what the cause was.

View solution in original post

9 REPLIES 9

Not applicable

Restarting Edge fixed this issue. We're not sure what the cause was.

remeeshnair
Participant IV

You can check the deployment status curl -v http://localhost:8080/v1/o/org name/apis/api proxy name/deployments -u userid:password on MS and find which all nodes it couldn't deploy. If the connectivity is fine and RMP nodes are up then you can try a MS restart. I have seen at times MS fails to deploy to all the nodes in the RMP cluster. CS team says its may be due to a slow network and cause timeout during deployment for that node, not exactly sure. I have also seen sometimes connection to RMP nodes breaks and it wont reconnect till you restart MS server. We are on OPDK 15.04.04.

Regards,

Remeesh

Eric,

Can you send the output of

/<install-root>/apigee4/bin/get-version.sh

If you are on a version older than OPDK 4.15.04.05 or OPDK 4.15.07.00 then likely you need to upgrade to one of these versions or patch level for a fix. The workaround to restart the Router(s) or MP(s) that got the "Call time out" error and that typically addresses this issue, outside of an upgrade.

Thanks,

Janice

OPDK version 4.15.07.01 is installed

...

------------------------------------------------------

Installed  Current Version
Apigee Enterprise  1.0.0.1078.fe7934c.1509010011   
Apigee UI  4.15.07.00-768b187-20151006-203453  Version 4.15.07.00-75d1384-20150823-004603 is available for upgrade
  /opt/apigee4/share/installer/apigee-upgrade.sh -c ui
Cassandra  2.0.15   
Zookeeper  3.4.5   
QPID  0.14   
Postgres  9.3

adas
Participant V

@Eric Dahl We recently fixed some of the deployment related issues, and pushed a hot fix to our cloud platform. This fix, once its baked in the cloud for a couple of weeks, maybe made available as a patch for 4.15.07. This might be available as 4.15.07.02 patch release, but I am not sure about the timelines yet.

Thanks. We've only seen the issue once thus far, but we've only recently updated to that version.

This issue has been addressed on Edge versions 4.15.04.11 and 4.15.07.03.

david_ryan
Participant V

This issue also exists in 4.16.01. As a side we did see this in 4.15.07 as well. Not sure it's totally fixed yet! I suspect bases on several threads in the community this IS related to multi datacenter implementations and network latency.

My big question is what is the solution? Simply increasing the rpc.timout in the cluster.properties? If so what is a reasonable or recommended setting. Ours is currently set to 10, not sure if it's milliseconds or seconds but please verify Apigee!

the big issue is that latency WILL happen - and whatever is putting the systems into the bad state is a temporary (usually) and un-trappable (usually) problem. when i have these issues - fixing them ALWAYS comes down to restarting some Apigee process. Never are we able to trap any network issues or high latency in other channels. So the Apigee software needs to either be watched for this problem (which would require API interfaces for process to process pings or something, or test deploys every couple of minutes) or we need some built in health check that knows not to fail and reset certain conditions when things go bad - and then re-try the deploy.