Re: SpikeArrest Policy getting triggered without r...

harishav · 12-22-2022 10:36 PM

Hi Team,

We are using a spike arrest policy on our proxy, with a rate limit of 10ps. But we observed that, though the threshold is not reached, we are getting ratelimit exceeded error. I am not expecting this to be accurate like it has to trigger exactly when 10 requests per second is breached, but what we are observing is if we try to access the API for the first time, it is throwing rate limit exceeded error, and this is intermittent.
Below is the config i am using,

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<SpikeArrest continueOnError="true" enabled="true" name="SA-ProxyRateLimit">
    <DisplayName>SA-ProxyRateLimit</DisplayName>
    <Properties/>
    <Identifier ref="flow.environment.name"/>
    <Rate ref="flow.proxy.ratelimit">10ps</Rate>
    <UseEffectiveCount>true</UseEffectiveCount>
</SpikeArrest>

Identifier would be resolved as "apicenter-test" in our case.

And then, i have tried to reproduce the error, and able to capture the trace the data while issue occurs.

Below is the data from debug,

 "properties": {
                "property": [
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.total.exceed.count",
                    "value": "0"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.identifier",
                    "value": "consumerapi-xxxxxx@@@apicenter-stage@@@proxyname@@@SA-ProxyRateLimit@@@nondistributed@@@apicenter-test@@@10Seconds"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.expiry.time",
                    "value": "0"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.class.used.count",
                    "value": "0"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.is.timeout",
                    "value": "false"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.fault.cause",
                    "value": ""
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.available.count",
                    "value": "9"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.datastore.fail.open",
                    "value": "false"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.allowed.count",
                    "value": "10"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.class.available.count",
                    "value": "0"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.fault.name",
                    "value": ""
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.class.exceed.count",
                    "value": "0"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.exceed.count",
                    "value": "0"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.class.allowed.count",
                    "value": "0"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.class.total.exceed.count",
                    "value": "0"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.used.count",
                    "value": "1"
                  },
                  {
                    "name": "ratelimit.SA-ProxyRateLimit.failed",
                    "value": "true"
                  }
                ]

I have triggered requests with a delay of more than 2 seconds, still lot of requests failed.

Also, above sample clearly shows, "ratelimit.SA-ProxyRateLimit.failed" is true, but then if i see the counts, used count is only 1 and available count is 9.

Not sure, if this is the right way to troubleshoot this, if yes please help if there is any issue with the config.

Thanks in Advance !

Giupo

I'm not sure, but i think the problem is that you use:

ref="flow.proxy.ratelimit"

You instructs SpikeArrest to look for a runtime value set via the request that is passed in as the flow.proxy.ratelimit flow variable. Did you populate the variable correctly? The value of the flow variable must be in the form of "intpm" or "intps".

Maybe it's a runtime error, which doesn't concern the limit (which seems to correctly drop to 9). See this :

can you give us the details of the respons and the returned error?

For completeness: If you define both ref and the body of this element, the value of ref is applied and takes precedence when the flow variable is set in the request. (The reverse is true when the variable identified in ref is not set in the request.)

harishav

Hi @Giupo ,

Thanks for your response!

Regarding the possible causes you have pointed out, we have configured the runtime variable as 10ps, meaning 10 per second, even the default is also set to the same threshold.

I see that is being applied as expected based on the name of the identifier:

 {
                    "name": "ratelimit.SA-ProxyRateLimit.identifier",
                    "value": "consumerapi-xxxxxx@@@apicenter-stage@@@proxyname@@@SA-ProxyRateLimit@@@nondistributed@@@apicenter-test@@@10Seconds"
                  },

Last part of the identifier says, 10Seconds and the "ratelimit.SA-ProxyRateLimit.allowed.count" property is also showing as 10. So, no issues on that.

Regarding the error we have got, basically the variable "ratelimit.SA-ProxyRateLimit.failed" set to "true" which clearly means spike arrest triggered.

Based on these inputs, we are good with the configuration, but the result is not as expected. Request to suggest on how to resolve this.

Giupo

Hi, you are wrong to only see the properties, in the debug there are also other information in other variables. The errors you get at runtime, for example, are in the variables defined here: https://cloud.google.com/apigee/docs/api-platform/reference/policies/spike-arrest-policy#runtime-err...

pietjacobs

I think it may have to do with the fact that you specify "10ps".
By setting it as 10ps, you will end up with 1 allowed request per 100ms (1000/10 = 100ms).

If you want to limit it to 10 requests per second without Apigee smoothing it into ms, you should specify "600pm". This way Apigee will calculate the rate per second.

You should read about how Apigee "smoothes" the actual limit per interval:
https://docs.apigee.com/api-platform/develop/rate-limiting#spikearrest

harishav

Hi @pietjacobs ,

Thanks for your response.
We are using Apigee X and using SpikeArrest with "UseEffectiveCount" flag set to "true", as mentioned at the beginning of my post. As per the docs, if we set this flag to "true", requests will not be smoothed:

https://cloud.google.com/apigee/docs/api-platform/reference/policies/spike-arrest-policy#useeffectiv...

Please correct me if this is not the expected behaviour.

Thanks!

pietjacobs

My remark is not related to the "UseEffectiveCount" flag. It is about how Apigee calculates the interval if you set the rate with "pm" vs "ps".

Could you try changing "10ps" to "600pm" and test if the throttling behavior has changed?

harishav

Got you, will change that and let you know how is it behaving.

As this is intermittent issue, i may not be able to confirm right away, we will change the config and monitor for a day or two and update here.

harishav

Hi @pietjacobs ,

We have applied this change of using rate limit per minute instead of per second and monitored for few days. It looks like the frequency of the error has reduced, but didn't completely resolved the issue.
We are still getting the issue rarely.

Any other things you suggest for this ?

Thanks!

pietjacobs

Well perhaps now you are only getting the error when the spike arrest should actually be activating? I am afraid this is as far as I can go with my ideas.

harishav

Hi @pietjacobs ,

That's not the case actually, we see the issue occurred on the first request itself, i have cross checked that by looking at the API Traffic for the API on API Proxy Performance section.

Not able to understand what is causing this behaviour.

SpikeArrest Policy getting triggered without reaching specified threshold