RegularExpressionProtection policy performance issues with larger payloads

Not applicable

We have the following policy to scan the JSON payload of POST requests for any XML or javascript injections.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<RegularExpressionProtection async="false" continueOnError="false" enabled="true" name="XSS-Injection-Protection-On-Request-Body">
    <DisplayName>XSS Injection Protection on Request Body</DisplayName>
    <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
    <JSONPayload>
        <JSONPath>
            <Expression>$</Expression>
            <Pattern><![CDATA[[\s]*(?i)(<\s*script\b[^>]*>)]]></Pattern>
            <Pattern>.*[<>=]+.*</Pattern>
        </JSONPath>
    </JSONPayload>
    <Source>request</Source>
</RegularExpressionProtection>

For typical requests that have about 70 lines of JSON data (when pretty printed) this policy takes about 200 milliseconds to scan/parse the request body. But whenever request has larger payloads this policy execution is taking longer resulting in longer response times (we noticed that this policy is taking around 3 seconds to parse a 700 line JSON payload).

So I was wondering if there's something that I need to fine-tune the regex here or a better way of implementing this policy? Please advise.

Solved Solved
1 6 556
1 ACCEPTED SOLUTION

Hmmm 3 seconds? That seems unacceptable.

Is this a paid organization? or a trial org ?

If I were investigating this, I'd do these two things:

  1. Try to aply the regex to the plaintext payload. The way you are doing it, the policy first de-serializes the content into a JSON object, and then analyzes the properties in that object. You could skip the first step and just treat the input as text. Policy config Like this:

    <RegularExpressionProtection name="XSS-Injection-Protection-On-Request-Body">
        <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
         <Variable name="request.content">
             <Pattern>REGEX PATTERN</Pattern>
             <Pattern>REGEX PATTERN</Pattern>
         </Variable>
    </RegularExpressionProtection>
    	

    The JSONPayload element is most useful if you want to apply the regex test to a sub-selection of the JSON. You're not doing that here, so avoid the use of the element.

  2. write a simple Java callout to match the regex on the text. I don't know the extent to which the regex is cached by the built-in Policy. In your custom code, you could use a final static Pattern object, pre-compiled, to make sure you are getting the fastest performance. Here again, operate on the request.content, not on a JSON-de-serialized object.

View solution in original post

6 REPLIES 6

Hmmm 3 seconds? That seems unacceptable.

Is this a paid organization? or a trial org ?

If I were investigating this, I'd do these two things:

  1. Try to aply the regex to the plaintext payload. The way you are doing it, the policy first de-serializes the content into a JSON object, and then analyzes the properties in that object. You could skip the first step and just treat the input as text. Policy config Like this:

    <RegularExpressionProtection name="XSS-Injection-Protection-On-Request-Body">
        <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
         <Variable name="request.content">
             <Pattern>REGEX PATTERN</Pattern>
             <Pattern>REGEX PATTERN</Pattern>
         </Variable>
    </RegularExpressionProtection>
    	

    The JSONPayload element is most useful if you want to apply the regex test to a sub-selection of the JSON. You're not doing that here, so avoid the use of the element.

  2. write a simple Java callout to match the regex on the text. I don't know the extent to which the regex is cached by the built-in Policy. In your custom code, you could use a final static Pattern object, pre-compiled, to make sure you are getting the fastest performance. Here again, operate on the request.content, not on a JSON-de-serialized object.

Thank you Dino for responding.

Yes, it's a paid organization (but I have submitted question from my personal apigee account).

Sure, will try out the suggestions and update.

I have tried out the first option and it solved my problem.

This policy was taking 3 seconds earlier to parse a 700 line JSON text; now it just takes 45 milliseconds to parse the same JSON text.

Great! Thanks for the followup note. That's good information.

BTW, this solution works best only when you've a pretty printed JSON payload; if you've entire JSON in one line (with no line breaks/indentation), this solution may not be an efficient one.

In my case, a pretty printed 700 line JSON is parsed in 45 milliseconds, but when combined those 700 lines into one, the policy takes 8-9 seconds which not acceptable.

I may have to use the Java callout solution that Dino has suggested or write some javascript that chunks the payload into multiple lines and applies the regex scanner.

If you go this route, it would be interesting to see the results of your performance testing.