What is the proper method to handle API calls when 1 MP/R out of 4 is down and traffic is affected?

We had 1 of our 4 MP/R servers go offline and become unreachable. We assume because of that it was causing some of the API's not return appropriately. We confirmed this when looking at the proxy in the UI and seeing the half moon with "The revision is deployed and traffic can flow...Call timed out; either server is down or server is not reachable" error.

Is there something that we should do in order to not have the request even attempt a down server? Are the running R's not supposed to test that connection and mark it as unavailable before it attempts to send anything to the down MP?

We were going to remove the MP from the environment to see about getting around it (via https://apidocs.apigee.com/management/apis/delete/organizations/%7borg_name%7d/environments/%7benv_n...), but the troubled server came back online.

We have monitors on these proxies that flap due to this issue causing confusion to those people that believe Apigee to be solid. So any ideas on how to avoid the confusion in the future whether it be a manual step or configuration step would be much appreciated.

Thank you!

0 2 209
2 REPLIES 2

You can manually change the availability status of a Router / Message processor with the steps here:

https://docs.apigee.com/private-cloud/v4.19.01/enablingdisabling-server-message-processorrouter-reac...

To clarify though - The 'call' referred to in this error: "The revision is deployed and traffic can flow...Call timed out; either server is down or server is not reachable" refers to the management server call to the message processors to process the deployment, not a client API call.

Routers won't direct traffic to an MP in an offline state. If it was, this probably means it was in a broken state rather than being properly offline, and this may need deeper investigation in the MP logs.

The hosting server of the MP/R was totally down, so we were not able to manually offline it properly. Requests were being sent to it in that state. We understand that it shouldn't and initially looking in the logs we didn't see any errors or anything to indicate why it was doing such things. We will be opening ticket if we see this in the future.

I should have also clarified that we are running 4.18.05 currently with plans of upgrading in the coming months.