What is the best way for monitoring rate limiting/spike arrest policies?

jeff_brown
Participant I

We have both spike arrest and rate limiting policies setup on a global (preflow) level and within conditional flows of some of our more server-intensive endpoints. We need a way to connect our PRTG sensors to the status of these policies so that we can be alerted if they are (or getting close to) being triggered. One hacky way I have done it for the global policies is to setup a flow which calls mock target and sets the response to be the number remaining in the policy. Any suggestions on a better way to do this (and how to do this with our conditional flow policies) would be very much appreciated!

2 4 1,077
4 REPLIES 4

adas
Participant V

The quota and spike arrest policies both expose a set of variables that tend to tell you allowed count, used count, exceeded count and so on. You could gather these variables, post the execution of the respective policies, have some custom code like javascript where you can add some decision making logic like >80% or >90% threshold and then fire a service callout to an endpoint with these values or send out an email alert using javascript/python policy to a email alias.

Here's a simple python script that can be used to send an email alert (let's call it QuotaSendMail)

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText


host = flow.getVariable("emailhost");
port = flow.getVariable("emailport");
username = flow.getVariable("emailusername");
password = flow.getVariable("emailpassword");
mailCC = flow.getVariable("emailcc");
mailTo = flow.getVariable("email");
bodyMessage = flow.getVariable("emailMessage");
subject = flow.getVariable("emailsubject");
mailFrom = flow.getVariable("emailfrom");
ssl = 'true';
auth = 'true';
    
msg = MIMEMultipart()
msg['Subject'] = subject
msg['From'] = mailFrom
msg['To'] = mailTo
msg['Cc'] = mailCC
#msg['Bcc'] = mailBCC
msg.attach( MIMEText(bodyMessage, 'html'))
smtpserver = smtplib.SMTP(host,int(port))
    
if ssl=='true':
    smtpserver.ehlo()
    smtpserver.starttls()
    smtpserver.ehlo
    
if auth=='true':
    smtpserver.login(username, password)
    smtpserver.sendmail(mailFrom, mailTo, msg.as_string())
    smtpserver.quit()


A simple javascript policy that can check the used count of the quota policy is here (let's call it CheckUsedQuota)

var allowedCount = context.getVariable("ratelimit.Quota.allowed.count");
var usedCount  = context.getVariable("ratelimit.Quota.used.count");
var percent = Math.round(allowedCount*.75);


if(usedCount == percent){
	context.setVariable("sendEmail",true);
	context.setVariable("emailMessage","Quota 75% reached");
}
if(usedCount >= allowedCount){
	context.setVariable("sendEmail",true);
	context.setVariable("emailMessage","Quota Exceeded");
}

Then you could add a fault rule like this:

        <FaultRule name="InvalidQuota">
            <Condition>ratelimit.Quota.failed == "true"</Condition>
            <Step>
                <Name>CheckUsedCountQuota</Name>
            </Step>
            <Step>
                <Condition>sendEmail == "true"</Condition>
                <Name>QuotaSendMail</Name>
            </Step>

I hope this gives you a reference point to implement your solution.

Be careful, though, with this. Supposing the quota is 10,000 per hour, and the system exceeds the limit by 400 (4%). This could result in 401 emails being sent. This is probably not what you want. Most people want a single email when a threshold is exceeded, not an email every time a new request is sent.

Also, due to rounding effects, the usedCount==percent test may never return true.

This is a nice illustration, but it's not really ready for production use.

Hi Dino,

What are your recommendations to have the above policy send exactly one email once the Quota is violated?

Regards,

Bett

Great Question @jeffbrown ,

The moment spike arrest / rate limiting policy is triggered , API flow enters into error flow where you can leverage Fault Rules & Fault Handling to execute policies like service callout to make an API call to external system which will trigger PRTG sensors. Does that help ? Keep us posted.