Goal
I am implementing Circuit Breaker with KVM, Load Balancer and Http Health Check to return default payloads in a 200 OK response back to the client when the Target services are down (i.e. 5XX status codes, most commonly 503).
Problem
The KVM policy (GET operation) is randomly slipped from the Step of the Fault Rule when sending exact same requests. This prevents the default payload from KVM returning back to the client.
Policies in Fault Rule
There are two main policies within the Fault Rule:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <KeyValueMapOperations async="false" continueOnError="false" enabled="true" name="kvm-retrieve-default-payload" mapIdentifier="default_response"> <DisplayName>kvm-retrieve-default-payload</DisplayName> <Properties/> <ExclusiveCache>false</ExclusiveCache> <ExpiryTimeInSecs>300</ExpiryTimeInSecs> <Get assignTo="defaultPayload"> <Key> <Parameter ref="referneceId"/> </Key> </Get> <Scope>environment</Scope> </KeyValueMapOperations>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <AssignMessage async="false" continueOnError="false" enabled="true" name="assign-default-response-error"> <DisplayName>assign-default-response-error</DisplayName> <Properties/> <FaultRules/> <Set> <StatusCode>200</StatusCode> <ReasonPhrase>OK</ReasonPhrase> <Payload contentType="application/json" variablePrefix="@" variableSuffix="#">@defaultPayload#</Payload> <Path/> </Set> <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables> <AssignTo type="error" transport="https" createNew="true">error</AssignTo> </AssignMessage>
Trace results
Only #4 is correct with the right step order as implemented in code.
Scenarios
<FaultRule name="503Handling_LB"> <Condition>(error.status.code >= 500)</Condition> <Step> <Name>extract_request_variables</Name> </Step> <Step> <Name>js-set-referenceId</Name> <Condition>(tmsId != null) or (tmsId != "")</Condition> </Step> <Step> <Name>kvm-retrieve-default-payload</Name> <Condition>(referneceId != null) or (referneceId != "")</Condition> </Step> <Step> <Name>assign-default-response-error</Name> <Condition>(defaultPayload != null) or (defaultPayload != "")</Condition> </Step> </FaultRule>
<Flow name="5XXHandling_Target"> <Description/> <Condition>(response.status.code >= 500)</Condition> <Request/> <Response> <Step> <Name>extract_request_variables</Name> </Step> <Step> <Name>js-set-referenceId</Name> <Condition>(tmsId != null) or (tmsId != "")</Condition> </Step> <Step> <Name>kvm-retrieve-default-payload</Name> <Condition>(referneceId != null) or (referneceId != "")</Condition> </Step> <Step> <Name>assign-default-response</Name> <Condition>(defaultPayload != null) or (defaultPayload != "")</Condition> </Step> </Response> </Flow>
<HTTPTargetConnection> <Properties> <!-- when true, redis-cache will not work --> <Property name="response.streaming.enabled">false</Property> <Property name="success.codes">1xx,2xx,3xx,5xx</Property> </Properties> <LoadBalancer> <Server name="default-target-server-mock"/> <MaxFailures>2</MaxFailures> </LoadBalancer> <Path>/mock-services-zaka/screens</Path> <HealthMonitor> <IsEnabled>true</IsEnabled> <IntervalInSec>10</IntervalInSec> <HTTPMonitor> <Request> <ConnectTimeoutInSec>5</ConnectTimeoutInSec> <SocketReadTimeoutInSec>5</SocketReadTimeoutInSec> <Port>443</Port> <Verb>GET</Verb> <Path>/mock-services-zaka/health</Path> </Request> <SuccessResponse> <ResponseCode>200</ResponseCode> </SuccessResponse> </HTTPMonitor> </HealthMonitor> </HTTPTargetConnection>
My thought on the causes
No <source> attribute can be set in KVM policy?
KVM is async in Fault Rule?
Is there anybody faced this issue before? Or any thought on this issue? Thanks.
Hi @Zaka Lei -
Before we go deeper into this, make sure you're not trying to attach these policies in the PostClientFlow. Only the MessageLogging policy is allowed there. If you currently have the policies attached in the PostClientFlow, move them elsewhere, such as to the ProxyEndpoint response, and try again.
Thanks!
Hi Floyd,
Thanks for your reply.
I put all those 4 policies in the steps in FaultRule name="503Handling_LB" shown above but somehow in Target endpoint it got executed in the PostClientFlow in Proxy Response Endpoint, where only has MessageLogging policy attached.
Please also find attached screenshots. I replaced the client related data with the dummy data and captured these screenshots. But it should be enough to show the issue.
Thanks.
@Floyd Jones forget to "@" you for the alert of my reply previously.
Thanks @Zaka Lei - Would you mind uploading the proxy zip to this thread?
Hi @Floyd Jones, unfortunately I cannot share my proxy zip here as it contains a lot business data. But I will try to replicate the issue with mock services and share the zip here. Thanks.