RegularExpressionProtection policy reg ex optimizations.

i am using the below regular expression policy

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<RegularExpressionProtection async="false" continueOnError="false" enabled="true" name="Regular-Expression-Protection">
    <DisplayName>Regular Expression Protection</DisplayName>
    <Properties/>
    <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
    <URIPath>
        <Pattern>[\s]*((delete)|(exec)|(drop\s*table)|(insert)|(shutdown)|(update )|(\bor\b))</Pattern>
    </URIPath>
    
     <JSONPayload>
        <JSONPath>
            <Expression>$.</Expression>
            <Pattern>[\s]*((delete)|(exec)|(drop\s*table)|(insert)|(shutdown)|(update )|(\bor\b))</Pattern>
        </JSONPath>
    </JSONPayload>
    <Source>request</Source>
</RegularExpressionProtection>

In this i am using Pattern "*" as wild char search. in Apigee anti pattern it is suggested to use Reluctant quantifiers like "*?". However the reluctant quantifiers take even more time than greedy quantifiers like "*".

Can you suggest an optimized quantifier which will take less time to process than "*".

Best regards,

Amit

0 7 246
7 REPLIES 7

Are you trying to optimize the performance of the RegularExpressionPRotection policy? If so, why? Are you observing that the policy is taking a very long time?

what problem are you really trying to solve?

Hi @Dino-at-Google - Its not taking too much time, just few nanosecs like 43000 nanosecs.

But i wanted to reduce this further because i read in Apigee antipatterns that the above reg ex is greedy quantifier and it can be optimised to reluctant or possessive quantifiers.

Best regards,

Amit

If you like you can easily switch to reluctant qualifiers.... have you tried it out?

Hi @Dino-at-Google

I have tried "*?", but its taking even more time.

Best regards,

Amit

What's your definition of more time, and how have you measured this

Hi @dane knezic


more time means "*?" is taking more than 8 times the time as compared to "*" qualifier.

I measured it by tracing the policy timetaken.

Best regards,

Amit

I think *? is probably not right. Maybe +?

For example this

drop\s*table

matches "drop" followed by zero or more whitespace, followed by "table".

And that's not actually what you want. It's really

drop\s+table

In other words, you're looking for at least one space between the words.

I don;t know why ? would take more or less time. That seems surprising.

I don't have any paerticular insight into how the regex engine is going to perform on any particular pattern. I would suggest trying various approaches that all satisfy your requirements (eg be thoughtful about drop\s*table versus drop\s+table) and then compare results.

But be sure to compare at load. Just running one request and looking at the time taken as indicated in the Trace UI is not a reliable way to evaluate performance of the running system.

Good luck!