Circuit Breaker pattern

Balakumaran · 03-10-2022 05:48 AM

I understand the Circuit Breaker pattern can be implemented with target health monitor in Apigee.

In my case, my target server host a bunch of APIs among them one of the API taking more time to respond and eventually timing it out. Since the time out taking more time, many threads are hanging in server. In this scenario health monitor passes since one among the 10 API is failing. heath monitor to realise the failure the entire server or the health check api has to go down which takes quiet some time. Is there a way to give high weightage to service response over health check response to trip the circuit for some time other than extending health monitor polling interval?

Below config helps to simulate the issue,

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<TargetEndpoint name="default">
<PreFlow name="PreFlow">
<Request/>
<Response/>
</PreFlow>
<Flows/>
<PostFlow name="PostFlow">
<Request/>
<Response/>
</PostFlow>
<HTTPTargetConnection>
<LoadBalancer>
<Server name="httpbin"/>
<MaxFailures>5</MaxFailures>
<ServerUnhealthyResponse>
<ResponseCode>502</ResponseCode>
<ResponseCode>503</ResponseCode>
</ServerUnhealthyResponse>
</LoadBalancer>
<Path>/</Path>
<HealthMonitor>
<IsEnabled>true</IsEnabled>
<IntervalInSec>5</IntervalInSec>
<HTTPMonitor>
<Request>
<ConnectTimeoutInSec>10</ConnectTimeoutInSec>
<SocketReadTimeoutInSec>30</SocketReadTimeoutInSec>
<Port>443</Port>
<Verb>GET</Verb>
<Path>/status/200</Path>
</Request>
<SuccessResponse>
<ResponseCode>200</ResponseCode>
</SuccessResponse>
</HTTPMonitor>
</HealthMonitor>
</HTTPTargetConnection>
</TargetEndpoint>

Call the proxy with /status/503 which will trip the circuit since the subsequent health check passes It put back the target in action.

Thanks,

dchiesa1

In my case, my target server host a bunch of APIs among them one of the API taking more time to respond and eventually timing it out. Since the time out taking more time, many threads are hanging in server.

What is the response time for the "long running" API when everything is healthy? What does "good behavior" look like?

Maybe the answer here is to reduce the timeout on the Apigee side. The default is 57 seconds I believe. Maybe you need to reduce that to 20 or 30 seconds, to more quickly handle the timeout case, and avoid the train-wreck scenario you described.

Balakumaran

Thanks for the reply.

Avg. response time of APIs are less than 300 ms, even in heavy load it responds back in below 1 sec. But some of the api due to bad fix , API times out after 55 sec.

I agree with you to reduce the global time out setting to lesser value. But the problem here is, when the user continuously try the rouge API, target will be marked as unhealthy, But the HTTP Health monitor returns 200, target server becomes healthy. Though the real problem is there in the particular API, Apigee couldn't trip the circuit due to Health monitor.

In my humble opinion, It's not practically possible to write a health monitor specific to each API to detect the health of the API

Thanks,