if the destination server responds with a 5xx error. Is it possible to try again?

When making requests to the destination server, if the server responds with a 5xx error. Is it possible to try again?

 

We have a synthetic monitoring proxy that connects to a health check service. Sometimes it happens that the destination server returns 504, and the next request returns 200. Is there any way when it returns 504 to make a new request (with a small delay) automatically?

 

wish thank you.

0 3 140
3 REPLIES 3

I had already seen this topic, man :).

But this is only possible with 2 destination servers. In this solution model, each error would remove 1 of the targets, being necessary to include a health monitoring in each target :(.

On my proxy there are more than 10 destination endpoints, if I were to create a new one for each destination, it wouldn't be a good thing, it would become a destination nest...

I wonder if there isn't an easier solution. I saw a thread (https://www.googlecloudcommunity.com/gc/Apigee/Target-API-Retry-policy-when-Target-API-has-failures-...) where it mentions the use of nodejs , but I still don't understand how 🤔.

We have a synthetic monitoring proxy that connects to a health check service. Sometimes it happens that the destination server returns 504, and the next request returns 200. Is there any way when it returns 504 to make a new request (with a small delay) automatically?

There is no way to tell the Apigee target server system to do that. Neither named target servers nor the explicit-URL HttpTargetServer does the try-delay-and-retry thing.

You could "build your own". One way to do it is to use the httpClient within the JavaScript step. You can retry within JS. The policy will finish only after the final retry or ... the policy timeout, whichever is first.

I am not sure I would do that, though. if I were trying to build a monitoring system, I wouldn't want to code it within the JS policy of Apigee. That's not really the right place for it. You said you wanted to retry "after a small delay." But Apigee isn't designed to be a system that manages longer-running workflows. It's designed to handle synchronous requests. Apigee is not a general purpose application server.

If the server returns a 504, why not just trust the 504? And retry later, on a regular schedule? Why retry immediately (or "after a small delay") from within Apigee, within the same request context? What problem does that solve?

If you don't want to "trust the 504", Apigee has an Integration feature, which is designed to support longer running workflows. That might be a more appropriate thing to use for the "try...and if failure, wait a bit and try again" logic. Or, if you don't want to license the Integration feature set just for that purpose, then write a JS thing and host it in AppEngine or similar, and do the retry outside of Apigee.

Based on my understanding of what you're grappling with, if I were solving this, I would just trust the 504. Retry later, on an appropriate schedule.

Why trust it? Well, one of the problems in distributed systems is that when failures occur, if dependent systems "refuse to take no for an answer", it can result in flooding of the failed or recovering system. That can make recovery SLOWER. If your system asks "Are you OK?" to a remote, and gets a "NO" response, then... a best practice is to trust that response, and maybe use exponential backoff for monitoring the system. Ask again in 30 seconds. On a second No, ask in 2 minutes. And so on. If you don't trust the 504, then you flood the system and that "health check" will probably make the problem (whatever it is) worse.

Often the source of the problem is not the target system. Instead, it's a network device between your "asker" and the "responder" and the 504 is "I'm trying to return to service, not up yet." If you (and many other askers) keep asking, it delays the restart.

If your target system is falsely returning a 504, then... that is a problem you should solve in the target system, in my opinion.