RegularExpressionProtection policy behavior

	Extracts information from a message (for example, URI Path, Query Param, Header, Form Param, Variable, XML Payload, or JSON Payload) and evaluates that content against predefined regular expressions. If any specified regular expressions evaluate to true, the message is considered a threat and is rejected.

Hi community,

I have some sort of list (regex patterns) - allowed ones.

Apigee policy works little opposite - if pattern evaluates to true - then it is a violation.

My situation is the exact opposite: I have only patterns that are allowed, e.g. - if regex matched - process, all others - fail.

Is there a way to 'reverse' this policy behavior, e.g.: if regex pattern matched, PASS, all others - FAIL.

Thanks,

Solved Solved
1 2 736
1 ACCEPTED SOLUTION

There is an old nerd joke that goes,

Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.

Regular Expressions cn be challenging. But there is a way to "negate" a regex. Basically just surround your original regex in (?! ) and then adding something positive to match... for example .+ or .*

The ?! is a negative lookahead assertion. It "looks ahead" and matches when the subject text does not conform to the pattern directly following.

For example, While this:

^[a-z]{6}

matches any sequence of 6 lowercase ascii letters, the "negation" of that regex

(?!(^[a-z]{6})).*

matches anything EXCEPT a sequence of 6 lowercase ascii letters. A sequence of 5 would match, and so would a sequence of 5 characters plus a digit, etc.

OK, let's see what that would look like in practice. Suppose your DESIRED, CORRECT payload is a JSON payload, in which the property with name "prop1" consists of a sequence of 6 lowercase ascii characters, followed by a sequence of 3 decimal digits.

This is a positive example:

{"prop1": "abmjkc123", "prop2": "12345"}

here are some negative examples:

payload reason
{"prop1": "ABCjkc123"} uppercase chars not allowed
{"prop1": "abc123" } not enough lowercase characters
{"prop1": "abcxyz1234" } too many decimal digits
{"prop1": "abcxyz44" } not enough decimal digits
{"prop1": "abcxyzqmh1234" } too many characters AND too many decimal digits

The simplest regex describing the desired form for prop1 is

[a-z]{6}[0-9]{3}

We want the RegularExpressionProtection policy to throw a fault when that DOES NOT match. So let's negate the regex. We configure the policy like this:

<RegularExpressionProtection name="Regular-Expression-Protection-1">
  <Source>request</Source>
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
    <JSONPayload>
        <JSONPath>
            <Expression>$.prop1</Expression>
            <Pattern>^(?!([a-z]{6}[0-9]{3})).*$</Pattern>
        </JSONPath>
    </JSONPayload>
</RegularExpressionProtection>

The ^ is the beginning of line assertion. The $ is the end of line assertion.

Just before end of line we include a pattern .*, which matches 0 or more characters. Basically it matches anything. But just BEFORE that is the negative lookahead. It says "no match if the entire string is a sequence of 6 lowercase followed by 3 decimal digits." Combining the two, it says "match anything EXCEPT if the entire string is a sequence of 6 lowercase followed by 3 decimal digits."

Sending a payload like this:

{"prop1": "abmjkc123", "prop2": "12345"}

..the policy does not throw a fault. No threat detected, as desired.

Sending a payload like this:

{"prop1": "abmjk123", "prop2": "12345"}

(notice, one less character, so now there are just 5 in the sequence)... causes the outer pattern to match. The input is NOT a sequence of 6 lowercase followed by 3 decimal digits, therefore the outer pattern (which negates "sequence of 6 lowercase followed by 3 decimal digits") matches. Which means a threat has been detected. The policy throws a fault as desired. Voila.

View solution in original post

2 REPLIES 2

There is an old nerd joke that goes,

Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.

Regular Expressions cn be challenging. But there is a way to "negate" a regex. Basically just surround your original regex in (?! ) and then adding something positive to match... for example .+ or .*

The ?! is a negative lookahead assertion. It "looks ahead" and matches when the subject text does not conform to the pattern directly following.

For example, While this:

^[a-z]{6}

matches any sequence of 6 lowercase ascii letters, the "negation" of that regex

(?!(^[a-z]{6})).*

matches anything EXCEPT a sequence of 6 lowercase ascii letters. A sequence of 5 would match, and so would a sequence of 5 characters plus a digit, etc.

OK, let's see what that would look like in practice. Suppose your DESIRED, CORRECT payload is a JSON payload, in which the property with name "prop1" consists of a sequence of 6 lowercase ascii characters, followed by a sequence of 3 decimal digits.

This is a positive example:

{"prop1": "abmjkc123", "prop2": "12345"}

here are some negative examples:

payload reason
{"prop1": "ABCjkc123"} uppercase chars not allowed
{"prop1": "abc123" } not enough lowercase characters
{"prop1": "abcxyz1234" } too many decimal digits
{"prop1": "abcxyz44" } not enough decimal digits
{"prop1": "abcxyzqmh1234" } too many characters AND too many decimal digits

The simplest regex describing the desired form for prop1 is

[a-z]{6}[0-9]{3}

We want the RegularExpressionProtection policy to throw a fault when that DOES NOT match. So let's negate the regex. We configure the policy like this:

<RegularExpressionProtection name="Regular-Expression-Protection-1">
  <Source>request</Source>
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
    <JSONPayload>
        <JSONPath>
            <Expression>$.prop1</Expression>
            <Pattern>^(?!([a-z]{6}[0-9]{3})).*$</Pattern>
        </JSONPath>
    </JSONPayload>
</RegularExpressionProtection>

The ^ is the beginning of line assertion. The $ is the end of line assertion.

Just before end of line we include a pattern .*, which matches 0 or more characters. Basically it matches anything. But just BEFORE that is the negative lookahead. It says "no match if the entire string is a sequence of 6 lowercase followed by 3 decimal digits." Combining the two, it says "match anything EXCEPT if the entire string is a sequence of 6 lowercase followed by 3 decimal digits."

Sending a payload like this:

{"prop1": "abmjkc123", "prop2": "12345"}

..the policy does not throw a fault. No threat detected, as desired.

Sending a payload like this:

{"prop1": "abmjk123", "prop2": "12345"}

(notice, one less character, so now there are just 5 in the sequence)... causes the outer pattern to match. The input is NOT a sequence of 6 lowercase followed by 3 decimal digits, therefore the outer pattern (which negates "sequence of 6 lowercase followed by 3 decimal digits") matches. Which means a threat has been detected. The policy throws a fault as desired. Voila.

Beautiful, Dino, you rock!!!