KVM policy slipped Randomly in Fault Rule steps

Not applicable

Goal

I am implementing Circuit Breaker with KVM, Load Balancer and Http Health Check to return default payloads in a 200 OK response back to the client when the Target services are down (i.e. 5XX status codes, most commonly 503).

Problem

The KVM policy (GET operation) is randomly slipped from the Step of the Fault Rule when sending exact same requests. This prevents the default payload from KVM returning back to the client.

Policies in Fault Rule

There are two main policies within the Fault Rule:

  • KVM (retrieve the default payload)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<KeyValueMapOperations async="false" continueOnError="false" enabled="true" name="kvm-retrieve-default-payload" mapIdentifier="default_response">
    <DisplayName>kvm-retrieve-default-payload</DisplayName>
    <Properties/>
    <ExclusiveCache>false</ExclusiveCache>
    <ExpiryTimeInSecs>300</ExpiryTimeInSecs>
    <Get assignTo="defaultPayload">
        <Key>
            <Parameter ref="referneceId"/>
        </Key>
    </Get>
    <Scope>environment</Scope>
</KeyValueMapOperations>
  • Assign Message (assign the default payload retrieved from KVM and send it back to the client)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="assign-default-response-error">
    <DisplayName>assign-default-response-error</DisplayName>
    <Properties/>
    <FaultRules/>
    <Set>
        <StatusCode>200</StatusCode>
        <ReasonPhrase>OK</ReasonPhrase>
        <Payload contentType="application/json" variablePrefix="@" variableSuffix="#">@defaultPayload#</Payload>
        <Path/>
    </Set>
    <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
    <AssignTo type="error" transport="https" createNew="true">error</AssignTo>
</AssignMessage>


Trace results

  • 1st request: the KVM policy executed in PostClientFlow after Assign Message policy
  • 2nd request: the KVM policy executed after Assign Message policy before Response send back to the client
  • 3rd request: the KVM policy executed after Assign Message policy after Response send back to the client
  • 4th request: the KVM policy executed before the Assign Message policy and before Response send back to the client

Only #4 is correct with the right step order as implemented in code.

trace-1.jpg

trace-2.jpg

trace-3.jpg

trace-4.jpg

postclientflow.jpg


Scenarios

  • After Apigee marks the target servers ‘down’ status, target response returns 503 error straight away. The code enters the Fault Rule. Then the unstable KVM execution problem occurs.
<FaultRule name="503Handling_LB">
    <Condition>(error.status.code >= 500)</Condition>
    <Step>
        <Name>extract_request_variables</Name>
    </Step>
    <Step>
        <Name>js-set-referenceId</Name>
        <Condition>(tmsId != null) or (tmsId != "")</Condition>
    </Step>
    <Step>
        <Name>kvm-retrieve-default-payload</Name>
        <Condition>(referneceId != null) or (referneceId != "")</Condition>
    </Step>
    <Step>
        <Name>assign-default-response-error</Name>
        <Condition>(defaultPayload != null) or (defaultPayload != "")</Condition>
    </Step>
</FaultRule>
  • When target services returns 5XX but before the Health Check marks the status as failed, the two policies in a normal flow works fine and stable.
<Flow name="5XXHandling_Target">
    <Description/>
    <Condition>(response.status.code >= 500)</Condition>
    <Request/>
    <Response>
        <Step>
            <Name>extract_request_variables</Name>
        </Step>
        <Step>
            <Name>js-set-referenceId</Name>
            <Condition>(tmsId != null) or (tmsId != "")</Condition>
        </Step>
        <Step>
            <Name>kvm-retrieve-default-payload</Name>
            <Condition>(referneceId != null) or (referneceId != "")</Condition>
        </Step>
        <Step>
            <Name>assign-default-response</Name>
            <Condition>(defaultPayload != null) or (defaultPayload != "")</Condition>
        </Step>
    </Response>
</Flow>
<HTTPTargetConnection>
    <Properties>
        <!-- when true, redis-cache will not work -->
        <Property name="response.streaming.enabled">false</Property>
        <Property name="success.codes">1xx,2xx,3xx,5xx</Property>
    </Properties>
    <LoadBalancer>
        <Server name="default-target-server-mock"/>
        <MaxFailures>2</MaxFailures>
    </LoadBalancer>
    <Path>/mock-services-zaka/screens</Path>
    <HealthMonitor>
        <IsEnabled>true</IsEnabled>
        <IntervalInSec>10</IntervalInSec>
        <HTTPMonitor>
            <Request>
                <ConnectTimeoutInSec>5</ConnectTimeoutInSec>
                <SocketReadTimeoutInSec>5</SocketReadTimeoutInSec>
                <Port>443</Port>
                <Verb>GET</Verb>
                <Path>/mock-services-zaka/health</Path>
            </Request>
            <SuccessResponse>
                <ResponseCode>200</ResponseCode>
            </SuccessResponse>
        </HTTPMonitor>
    </HealthMonitor>
</HTTPTargetConnection>


My thought on the causes

No <source> attribute can be set in KVM policy?

KVM is async in Fault Rule?

Is there anybody faced this issue before? Or any thought on this issue? Thanks.

0 5 309
5 REPLIES 5

Hi @Zaka Lei -

Before we go deeper into this, make sure you're not trying to attach these policies in the PostClientFlow. Only the MessageLogging policy is allowed there. If you currently have the policies attached in the PostClientFlow, move them elsewhere, such as to the ProxyEndpoint response, and try again.

Thanks!

Hi Floyd,

Thanks for your reply.

I put all those 4 policies in the steps in FaultRule name="503Handling_LB" shown above but somehow in Target endpoint it got executed in the PostClientFlow in Proxy Response Endpoint, where only has MessageLogging policy attached.


Please also find attached screenshots. I replaced the client related data with the dummy data and captured these screenshots. But it should be enough to show the issue.


Thanks.

@Floyd Jones forget to "@" you for the alert of my reply previously.

Thanks @Zaka Lei - Would you mind uploading the proxy zip to this thread?

Hi @Floyd Jones, unfortunately I cannot share my proxy zip here as it contains a lot business data. But I will try to replicate the issue with mock services and share the zip here. Thanks.