Throttling API calls per second without smoothing logic

Hello everyone...

I have a specific requirement in terms of throttling API calls coming into apigee Edge public cloud.

  • Ability to throttle API traffic where I cannot allow more than 10 tps (Please note: Transactions per SECOND)
  • The 10 tps traffic SHOULD not be going thru the SMOOTHING logic as 10 tps is a valid traffic for our backend to handle.
    • This necessarily rules out the usage of Spike Arrest policy as the 10 tps configuration would smooth out and doesn't allow more than 1 call every 100 ms.
    • In my situation, our backend system can handle 10 calls at the same millisecond and it is a valid scenario for us to handle (Not considered as a Spike).
    • I have also looked ConcurrentRateLimitPolicy and apparently, it is logical that it doesn't serve my requirement.
  • I've looked at Quota policy which can simply act as a counter. But there are limitations where:

On a tail note, its kind of sad that apigee doesn't support this feature which has been available in IBM Datapower, WSO2, Axway API Gateway, etc. for a very long time.

Does anyone have any suggestions to solve this problem?

I'm ok to have a complex custom implementation too as I'm desperately looking for this feature. A critical deployment is just pending for this logic to be implemented. Would appreciate any guidance on this.

Thanks.

Solved Solved
1 7 2,879
1 ACCEPTED SOLUTION

Hi @Dinesh Palanivel,

You can indeed set Quota per second, please see docs here: https://docs.apigee.com/api-platform/reference/policies/quota-policy#timeunit

Perhaps they have been updated since your original post.

Also note, that using "second" means it cannot be distributed, so you'll have to account for the # of MPs you have. For example if you have 4 MPs, setting 3 TPS will yield an effective count of 12 TPS.

<Quota async="false" continueOnError="false" enabled="true" name="QU-BySecond">
    <DisplayName>QU-BySecond</DisplayName>
    <Allow countRef="request.header.quota_allow" count="3"/>
    <Interval ref="request.header.quota_interval">1</Interval>
    <TimeUnit ref="request.header.quota_timeunit">second</TimeUnit>
</Quota>

View solution in original post

7 REPLIES 7

please read the docs on the Spike Arrest policy here as the documentation clearly lays out what the spike arrest policy is used for.

From the docs:

Think of Spike Arrest as a way to generally protect against traffic spikes 
rather than as a way to limit traffic to a specific number of requests. Your APIs and backend can handle a certain amount of traffic, and the Spike Arrest policy helps you smooth traffic to the general amounts you want.

Does anyone have any suggestions to solve this problem?

Yes, raise your spike arrest policy to a value that is meant to limit what your backend can handle just before it begins degradation. Then use a quota policy to implement a per-minute business restriction. If you want to ensure your clients don't exceed your exact specified amount, it sounds like 600 RPM would do it for you. Note: there is no per-second option for quota.

The spike arrest policy does not keep a counter and is not intended to be used as a business policy to specify how much traffic your business clients should send per-time-interval. Rather, the purpose of the policy is to prevent sudden spikes in traffic and to prevent DDOS attacks. For instance, if Apigee Edge behaved the way you are requesting, depending on which rate limiting algorithm was chosen, there is a chance that a backend system could actually receive more traffic (burst traffic) per second than what it could actually handle. For your configuration, this doesn't sound like a lot but when we are talking about systems with a setting of 500RPS, it can have a huge negative impact.

Our backend system can handle 10 calls at the same millisecond and it is a valid scenario for us to handle

Great! So raise the spike arrest to 10000 RPS and you are done.

Obviously smoothing was designed into the product because a majority of customers want to be able to have a steady stream of traffic while still being protected against major burst traffic. As those upper traffic limits are reached, smoothing accomplishes that goal. Clearly the Apigee engineering team developed the product to work a specific way and it's a rather awesome product compared to many other API management tools, including the ones you've mentioned in your other post.

Finally, most customers have a load balancer between Apigee Edge and their backend systems. In your systems architecture, what LB is there now? Does that LB implement the exact rate limiting algorithm you want? If not, procure yourself a LB that does implement your exact algorithm and stick it between Apigee Edge and your app server. There are many free ones out there. Problem solved.

Thanks for responding @Robert Johnson. Appreciate your time.

What I agree on?

  • Applying the SpikeArrest Policy at 10000ps to handle the traffic without rejecting on apigee.
    • This will simply address a need of handling the traffic of 10 tps (at the same millisecond) on apigee and DOES NOT handle a counter which will reject the traffic after 10 tps.

What I disagree on and why?

  • Even if we configure 10000ps for Spike Arrest Policy, it doesn't meet the backend needs of apigee throttling the traffic at 10 tps.
    • In this case, backend will start rejecting the traffic if it exceeds 10 tps.
    • Please note, as I said before, backend just maintains a counter at per second level.
  • Applying the logic at a backend or at a Load Balancer defeats the purpose of having an API Management product in the first place.
    • At the moment, we are planning to hook up apigee directly to our microservices layer (NodeJS or SpringBoot).
    • Having a traffic management rule at the microservice layer is not a clean architecture in my opinion.
    • From a typical organization setup perspective, all the layers/stack doesn't come under our DevOps purview.
      • We might need to work with various teams like Network engineering or Cloud Engineering to deal with policy updates on LBs which will slow us down.
  • When we apply the logic at the backend, we will NOT get the capability of throttling at app developer or product level in terms of number of transactions per second.
  • Traffic management policies/rules are unmanageable at one point of time and this is truly an unsustainable architecture.
    • Will be a maintenance nightmare for Ops folks.
  • Very unlikely scenario:
    • If apigee does light weight orchestration/business logic (which we don't do in our organization), getting the throttling related numbers calculated is not straight forward.

Bottomline is that, I'm looking for a Quota Policy or something better than that (wanna call Throttling policy!!), which will enforce number of transactions at a second level.

To the folks on this community who is always passionate about lecturing about the documentation on SpikeArrestPolicy, QuotaPolicy, etc. and the sync up challenges b/w MPs, I get that point and I have read the documentation probably 50+ times. 🙂

Hi @Dinesh Palanivel,

Robert correctly pointed out.

You can use spike arrest in TPM. Also, keep in mind that spike arrest ie per server.

Hence, if you have two servers then actual value is 2*spike arrest set.

Unfortunately, just utilizing the Spike Arrest policy with the out of box features will not help solving the challenge I have. 🙂

Hi @Dinesh Palanivel,

You can indeed set Quota per second, please see docs here: https://docs.apigee.com/api-platform/reference/policies/quota-policy#timeunit

Perhaps they have been updated since your original post.

Also note, that using "second" means it cannot be distributed, so you'll have to account for the # of MPs you have. For example if you have 4 MPs, setting 3 TPS will yield an effective count of 12 TPS.

<Quota async="false" continueOnError="false" enabled="true" name="QU-BySecond">
    <DisplayName>QU-BySecond</DisplayName>
    <Allow countRef="request.header.quota_allow" count="3"/>
    <Interval ref="request.header.quota_interval">1</Interval>
    <TimeUnit ref="request.header.quota_timeunit">second</TimeUnit>
</Quota>

@Kurt Googler Kanaskie Thank you so much for the time in responding to this thread.

It's great that finally we are getting a solution.

I understand the technical challenge behind why we need to have distributed=false. At least we have a decent solution which is better than nothing. The only possible issues in this implementation would be an inconsistent 429 http errors when we don't effectively load balance across the MPs and especially if we are in two different DNs/regions.

@Kurt Googler Kanaskie Apparently, as of 01/16/2020, the feature still doesn't work per the documentation. There is a bug being created by apigee support and getting assigned to engineering.

If anyone is trying to use the solution, don't get surprised.