Spike Arrest Policy is not working as expected

sayalipatil · 11-10-2021 12:42 AM

I am trying with Spike Arrest policy at one of the proxies as we have a requirement to restrict no of requests per second. Followed steps as per policy doc and expecting 30 request per second as added tag <Rate>21ps<Rate> in policy along with <UseEffectiveCount>true</UseEffectiveCount> to get distributed count across all MP. But it's not working as expected and throwing errors for some requests even if we try with 10 requests per second.

    <Identifier ref="apigee.client_id"/>
    <Rate>30ps</Rate>
    <UseEffectiveCount>true</UseEffectiveCount>

I have checked the region and MP deployed in it, and it looks like there are 21 Message Processors in that region.

so as per documents spike arrest should work like this,
The number of requests in the <Rate> node will get divided by the number of MPs.
i.e for <Rate> 30ps</Rate> and 21 MPs it will be 30/21 = 1.42 i.e 2 request per second each MP should get allowed.

but few requests are failing with 429 even when we try 10-15 requests per second which is less than Rate we have mentioned.

I am looking for Apigee policy/configuration which I Can use to restrict 30 requests per second.

@anilsagar

dknezic

The spike arrest policy is effectively a traffic smoothing policy.

1. There's bucketing, that effectively divides your time into 10 slots. eg when you configure it to allow for example 30tps, it will allow 3 requests every 100ms.

2. As you've mentioned with effective count, the count is further divided based on the number of MPs..

When you combine 1 and 2, it could be request is going to an MP that has already processed a request and hence you get the 429.

Also, when you say you send 10-15 requests per second, is that 15 requests sent spread out over a second, or concurrently or sequentially immediately one after another? ie a burst of traffic will still trigger the spike arrest.

By the way, is this OPDK with 21 dedicated message processors?

sayalipatil

This is what even I have noticed,

"When you combine 1 and 2, it could be request is going to an MP that has already processed a request and hence you get the 429"

when only 2 requests will get allowed by each MP in 1 second, I could see more than 2 requests are going to multiple MPs and some of them don't even receive the single request,

that's where it's returning 429.

And those 15 requests sent spread out over a second.