Retry API request to a target endpoint when we get particular error response.

Hi,

We have requirement to retry a request when we get 429 error response from target endpoint, we need to hold that request and retry after (60-70 seconds) to that same target endpoint. Is it possible to do?

We have also went through this Community Post and found that to retry we need  at least two target servers. But in our case we have only one target endpoint.

Need Some experts opinions on the above ask.

Thanks

@dchiesa1 @kurtkanaskie @dknezic @shrenikkumar-s 

 

 

Solved Solved
3 2 550
1 ACCEPTED SOLUTION

We have requirement to retry a request when we get 429 error response from target endpoint, we need to hold that request and retry after (60-70 seconds) to that same target endpoint. Is it possible to do?

It is possible to do what you describe, but you wouldn't use Apigee to do that.  You'd use the combination of Apigee and another tool, a different tool. The Apigee Platform, focused around API Management, offers some useful capabilities in the areas of publishing an API Product catalog, implementing a common control layer for APIs, and gaining insight from analysis and security checks on API traffic. But APIs are synchronous, and the Apigee runtime is not a good container or tool for managing long-running transactions with branching, exception management, and related controls. The Apigee runtime isn't designed to hold onto pending requests for 60-70 seconds or more. In fact the default timeout is ~55 seconds, which means Apigee will drop the inbound connection if the initial API call does not resolve before that timespan elapses. An API Architect will tell you that 55 seconds is much too long anyway, and you should be designing your APIs to respond in well under that threshold. 

For long running transactions with retry, and lots of other things, I suggest that you use an integration platform. A runtime that is designed to handle those concerns is very different from a runtime that is designed to act as a reverse proxy. ("Reverse proxy" means , to the consuming app, Apigee-managed endpoints just look like an regular API)

Fortunately there are lots of options out there. One of them is available from Google: Application Integration. Check it out, it can do what you describe. There is a Timer Task, which lets you add a delay into any flow. And a while loop task. With these two tasks, you could build a flow that loops up to 3 times (or 5, or 8...whatever makes sense) trying a backend service, with an exponential backoff interval for retry.  If the first request succeeds (200, not 429), then all good. If not, then delay for 1 second, and then try again.  If that fails, delay for 2 seconds before trying again.  Then 4, 8, 16... Or whatever makes sense for you.  Google Application Integration is not the only integration platform that does while loops and delays; you could do this in other tools, too.

It's not as simple as "plug in the integration layer as an Apigee target and let it handle the delay." Remember, Apigee is going to drop the request after 55 seconds, and really you want your APIs to resolve more quickly than that, anyway, for the best user experience.  In 2-3 seconds, ideally, as a maximum.You don't want Apigee holding onto the inbound client request while the integration layer is doing its exponential-backoff-and-retry m4gic. So you'd need to convert your API to be something a little different. Rather than configuring Apigee to return a 200 status code ("OK", which usually implies, "got your request, fulfilled it, all done.") you might want to configure your API that does retry, to return a 202 Accepted status code, which implies "got your request, haven't fulfilled it yet, but I'm working on it." (more on that). In most cases a 202 status response will also include a URI , which the calling application can use to check status of the pending request.

If you are thinking "outside in" then you probably want to make it easy for external developers to consume this API that does retry. So you'd have it return 202 ALWAYS, in all cases, whether the first request sent to the true backend returns a 429 or not.  Which means Apigee won't connect directly to the backend; Apigee would connect to the Integration layer, sending it an HTTP  request, which acts as the "trigger" for the flow in that layer. The integration layer immediately responds with "200 OK, the flow has started", along with a URI that indicates how the client can check on the status of the request. and then  Apigee relays a 202 and the URI back to the client. Client apps would need to use the returned Location URI, to check on the progress and results of their request.

Stepping back, this will work, but it's a bunch of work on the API side, if you want to handle a 429 status with backoff-and-retry. An alternative of course, is to just "keep it simple": RETURN THE 429 to the calling client, and let the client schedule it's own retry logic.  

I can see pros and cons to either approach. If the clients are "open clients" , like lots and lots of partner apps that you have no control over, then for sure I would want my API to just return the 429 and let the client sort it out. If the client is a 1st-party client, something used by employees, and it would be simpler and kinder to handle the retry for them, and you're pretty sure you won't get abuse of the behavior, then ... using Application Integration with the timer task and builtin retry loop, would be my choice.

And a further concern is, you might want an Integration platform anyway.  There are lots of other things you can do with an Integration platform, beyond this retry-with-backoff pattern. It's a really handy tool to have in the toolbox. So you may want to think more broadly about expanding the portfolio on your side, and this backoff-with-retry-for-429 might just be the FIRST use case for this new platform. 

View solution in original post

2 REPLIES 2

We have requirement to retry a request when we get 429 error response from target endpoint, we need to hold that request and retry after (60-70 seconds) to that same target endpoint. Is it possible to do?

It is possible to do what you describe, but you wouldn't use Apigee to do that.  You'd use the combination of Apigee and another tool, a different tool. The Apigee Platform, focused around API Management, offers some useful capabilities in the areas of publishing an API Product catalog, implementing a common control layer for APIs, and gaining insight from analysis and security checks on API traffic. But APIs are synchronous, and the Apigee runtime is not a good container or tool for managing long-running transactions with branching, exception management, and related controls. The Apigee runtime isn't designed to hold onto pending requests for 60-70 seconds or more. In fact the default timeout is ~55 seconds, which means Apigee will drop the inbound connection if the initial API call does not resolve before that timespan elapses. An API Architect will tell you that 55 seconds is much too long anyway, and you should be designing your APIs to respond in well under that threshold. 

For long running transactions with retry, and lots of other things, I suggest that you use an integration platform. A runtime that is designed to handle those concerns is very different from a runtime that is designed to act as a reverse proxy. ("Reverse proxy" means , to the consuming app, Apigee-managed endpoints just look like an regular API)

Fortunately there are lots of options out there. One of them is available from Google: Application Integration. Check it out, it can do what you describe. There is a Timer Task, which lets you add a delay into any flow. And a while loop task. With these two tasks, you could build a flow that loops up to 3 times (or 5, or 8...whatever makes sense) trying a backend service, with an exponential backoff interval for retry.  If the first request succeeds (200, not 429), then all good. If not, then delay for 1 second, and then try again.  If that fails, delay for 2 seconds before trying again.  Then 4, 8, 16... Or whatever makes sense for you.  Google Application Integration is not the only integration platform that does while loops and delays; you could do this in other tools, too.

It's not as simple as "plug in the integration layer as an Apigee target and let it handle the delay." Remember, Apigee is going to drop the request after 55 seconds, and really you want your APIs to resolve more quickly than that, anyway, for the best user experience.  In 2-3 seconds, ideally, as a maximum.You don't want Apigee holding onto the inbound client request while the integration layer is doing its exponential-backoff-and-retry m4gic. So you'd need to convert your API to be something a little different. Rather than configuring Apigee to return a 200 status code ("OK", which usually implies, "got your request, fulfilled it, all done.") you might want to configure your API that does retry, to return a 202 Accepted status code, which implies "got your request, haven't fulfilled it yet, but I'm working on it." (more on that). In most cases a 202 status response will also include a URI , which the calling application can use to check status of the pending request.

If you are thinking "outside in" then you probably want to make it easy for external developers to consume this API that does retry. So you'd have it return 202 ALWAYS, in all cases, whether the first request sent to the true backend returns a 429 or not.  Which means Apigee won't connect directly to the backend; Apigee would connect to the Integration layer, sending it an HTTP  request, which acts as the "trigger" for the flow in that layer. The integration layer immediately responds with "200 OK, the flow has started", along with a URI that indicates how the client can check on the status of the request. and then  Apigee relays a 202 and the URI back to the client. Client apps would need to use the returned Location URI, to check on the progress and results of their request.

Stepping back, this will work, but it's a bunch of work on the API side, if you want to handle a 429 status with backoff-and-retry. An alternative of course, is to just "keep it simple": RETURN THE 429 to the calling client, and let the client schedule it's own retry logic.  

I can see pros and cons to either approach. If the clients are "open clients" , like lots and lots of partner apps that you have no control over, then for sure I would want my API to just return the 429 and let the client sort it out. If the client is a 1st-party client, something used by employees, and it would be simpler and kinder to handle the retry for them, and you're pretty sure you won't get abuse of the behavior, then ... using Application Integration with the timer task and builtin retry loop, would be my choice.

And a further concern is, you might want an Integration platform anyway.  There are lots of other things you can do with an Integration platform, beyond this retry-with-backoff pattern. It's a really handy tool to have in the toolbox. So you may want to think more broadly about expanding the portfolio on your side, and this backoff-with-retry-for-429 might just be the FIRST use case for this new platform. 

Thanks @dchiesa1 for your valuable insights.