ApigeeX- Spike Arrest policy issue in Production

Hi All,

           We are facing the following Spike Arrest policy issue in production. We see lots of  API calls rejected due to Spike Arrest. Upon checking the Trace session, we see even though there are still counts available the call gets rejected.

The count is 5ps and its based on the identifier API Key.

Screenshot 2023-01-06 at 08.02.51.png

Unfortunately, we are unable to simulate the same in lower environments since we don't get that volume of requests as in Production.

Can someone explain if we have  missed anything ?

Any guidance will be appreciated.

Thanks

 

0 5 182
5 REPLIES 5

Can you share how the policy has been configured, some approximate total TPS and which Apigee you're using? 

Hi,

The policy has the spike value and identifier which is API key. 

<SpikeArrest continueOnError="false" enabled="true" name="SA-prevent-burst-attack">
<DisplayName>SA-prevent-burst-attack</DisplayName>
<Properties/>
<Rate ref="spikeVal">50ps</Rate>
<Identifier ref="client_id"/>
</SpikeArrest>

The ref value is got from App attribute which is 5ps in most of the scenarios. 

We don't use "UseEffectiveCount" so it smooths the traffic.

Also we are using ApigeeX ..

Any pointers to fix this would be very helpful.

 

Thanks

How many / how often do you get requests from the same API Consumer? One thing to note is having a rate of 5 requests per second, is effectively 1 request every 200ms, rather than 5 requests per second.. it's a subtle but important difference.

@dknezic 

This API is heavily used in production. We had to revert back from production due to high volume of requests being rejected by Spike Arrest Policy.

But seeing the log for few hrs we had in Production, there was 287 calls/hr for this API for a specific API Key( for 1 customer alone). Moreover this is not during the peak period since the production move was done during non-peak hrs.

Regarding the spike values ,Yes , you are perfectly right. 5ps is 1 request every 200 ms.But we are not sure how this is executed . The trace values are misleading - it says we have count available but the call gets rejected. Moreover in Production, we couldn't debug for long to get enough data.

How do we know that this is executed properly in order for us to update the Spike value to a more appropriate one?

Note: This API is being migrated from another API system to ApigeeX . We had 5ps in earlier system and didn't face this issue there. We are fine to modify the value but not sure how to validate if this policy is executing correctly.

Thanks

 

 

Hi @dknezic , 

                         I did some analysis of the Trace calls that we had the issue with. We log the entries in the Cloud Logging in the PostClientFlow .  Upon analysing  the Client Received Start TimeStamp of these entries , it looks like the spike was executed correctly. In 200ms if there is more than 1 call, the spike has been invoked.

But we see different data in the Trace. Why is that so?

What is the correct way to check this policy ?

Thanks