unexpected behavior in RegularExpressionProtection Policy for JSONPayload

Hi Apigee Experts

I am trying to put regular expression on request json fileds. For this I am using RegularExpressionProtection Policy and in <JSONPayload> placing pattern with expression.

Sample i.e I am trying in RegularExpressionProtection Policy mentioned below :-

  <JSONPayload>
    <JSONPath>
      <Expression>$.example.parent.A</Expression>
    <Pattern>^(?![.a-zA-Z0-9 ]+$)</Pattern> </JSONPath>
    <JSONPath>
      <Expression>$.example.parent.B</Expression>
      <Pattern>^(?![Yes|No]$)</Pattern>
    </JSONPath>
    <JSONPath>
      <Expression>$.example.parent.C</Expression>
      <Pattern>^(?![a-zA-Z0-9 ]+$)</Pattern>
    </JSONPath>
    ... more fields, about 10 total ...
  </JSONPayload>

But regex policy is stopping to trace(work) if I am changing or removing fields in request (testing by using postman in json body) .The major issue is it only works for some fields and not show for others if they are having regex threat .

So my query here is , Can we not apply regex on all json fields (if pattern for all field is different) in RegularExpressionProtection Policy altogether ?

Could anyone please let me help/guide , if any solution available for my problem?

Thanks in advance .

0 10 683
10 REPLIES 10

The major issue is it only works for some fields and not show for others if they are having regex threat .

Can you give me a SPECIFIC, working example that reproduces the behavior you are describing? a specific case that demonstrates the problem? I don't need all 10-12 fields. In fact, fewer is better.

This would be:

  1. A specific, complete policy configuration (I don't need all 10-12 fields, in fact fewer fields is better)
  2. A specific inbound payload value that by inspection *should* match the regex but does not.

It is possible your regex are not doing what you think, and the behavior you are observing might be expected. Regex are tricky. So I want to understand specifically , exactly, what you're doing and seeing.

As an example of regex being tricky, you are using this pattern:

   <Pattern>^(?![Yes|No]$)</Pattern>

And, one might look at that quickly and infer your intention is to allow the field to match the string "Yes" or the string "No". But that is not what that pattern does.

The square brackets denote a character class. Therefore an expression like [Yes] matches exactly one character, of the set that includes Y, e, s. The string "Yes" does not match. The string "Y" matches and the string "e" matches, and the string "s" matches.

A regular expression like this

[Yes|No]

...matches exactly one character of the set that includes Y, e, s, |, N, o .

The string "o" matches. The string "|" matches. But the string "Yes" does not match.

A regular expression like this:

^(?![Yes|No]$)

...matches ... let's think about it... You have a negative lookahead assertion with a character class, and an end-of-line assertion. So it matches any string that does not begin with Y, e, s, |, N, o, , or any string that is longer than one character.

Yes matches, but "Y" does not. I think that's not what you want.

You want the pattern to NOT match on Yes and to match (trigger a fault) on "Y" or anything other than "Yes" or "No".

Just guessing, but I think what you want is an alternation (parenthesis, not square brackets):

 ^(?!(Yes|No)).*$

Break that down:

The inner most atom is the (Yes|No) . That matches any string that is either "Yes" or "No". That itself is wrapped in a negative lookahead ?!

The pattern (?!(Yes|No)) is a zero-width assertion that fails for anything that is not "Yes" and not "No". But it is a zero-width assertion; we still need a pattern to collect the string. So that's what the .* does.

The wrapping ^ and $ provide beginning-of-string and end-of-string assertions. Notice both of these assertions lie outside the negative-lookahead. The net result is a string that is zero or more characters, and is neither "Yes" nor "No".

That pattern will match on "Maybe" or "M" or "Yo" but not on Yes or No. I am guessing that this is the behavior you want.

I suppose there may be other problems with your other regex patterns. Regex are tricky.

To answer your question

Can we not apply regex on all json fields (if pattern for all field is different) in RegularExpressionProtection Policy altogether ?

Yes. You need to use a more general JSONPath to refer to all the fields you want to check.

<Expression>$.example.parent.B</Expression> refers to a very specific field in the JSON. Generalize it to match other fields, if that's what you want.

Hi Dino

Thanks for spending time on this and for your suggestion.



My Request json structure is :-


{

"Example":

{

"parent":

{

"A": "2019-07-09-09.29.35.182000",

"B":"1213",

"C": "51280382",

"D":"Auto1",

"E":"Y"

}

}


Policy Configuration:

<JSONPayload>

<JSONPath>

<Expression>$.Example.parent.A</Expression>

<Pattern>^(?![-.a-zA-Z0-9 ]+$)</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.B</Expression>

<Pattern>^(?![-.a-zA-Z0-9]+$)</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.C</Expression>

<Pattern>^(?![-.a-zA-Z0-9 ]+$)</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.D</Expression>

<Pattern>^(?![-.a-zA-Z0-9]+$)</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.E</Expression>

<Pattern>^(?![N|Y]$)</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.F</Expression>

<Pattern>^(?!(([0-2][0-9]|(3)[0-1])(\/)(((0)[0-9])|((1)[0-2]))(\/)\d{4}$))</Pattern>

</JSONPath>

</JSONPayload>


I have tested some below scenarios:


Scenario 1:

{

"Example":

{

"parent": {

"A": "2019-07-09-09.29.35.182000",

"B":"1213",

"D":"Auto%^1" ,

"E":"Y"

}

}

Actual Result:

It is not throwing the regex threat for parameter "D".


Scenario 2:

{

"Example":

{

"parent":

{

"A": "2019-07-09-09.29.35.182000",

"B":"1213",

"D":"Auto1$%",

"C": "51280382",

"E":"Y"

}

}

Actual Result:

In this case, it is validating parameter "D" and throwing regex threat.

From the above two scenarios, what I observed is- If parameter (Say "D" in Scenario 2) is not as per the required pattern, it does throw regex threat only when parameters A, B and C are present in the request body, irrespective of the sequence of them in the request.

Here my question is -

In Scenario 2, If we want it to throw regex threat for Parameter D(which is not as per Regex), In the request body should it have the parameters A, B and C (which are present above D in the Policy)?


Could you please guide me on this issue?

Thanks in Advance.


Hi @Dino-at-Google

Have you got any chance to look into this ?
Is there any issue or I am missing anything .
Kindly provide your suggestion .

Thanks and Regards.

Again I think you're having difficulty with creating the correct regular expressions. The issue is not the Apigee policies, it's the configuration you are providing to them.

I think you want to accept only the characters A-Z, a-z and 0-9 for field D . And maybe you also want to allow dash and space.

If that's the case, then use a regex to catch any character that is NOT that.

An easy way to do this is negate a character set. This matches on any character not A-Z, nor a-z, nor 0-9, nor dash nor space:

[^-a-zA-Z0-9 ] 

The policy config employing this regex would look like this:

<RegularExpressionProtection name="RegularExpressionProtection-1">
  <Source>request</Source>
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
  <JSONPayload>
    <JSONPath>
      <Expression>$.Example.parent.D</Expression>
      <Pattern>[^-a-zA-Z0-9 ]</Pattern>
    </JSONPath>
  </JSONPayload>
</RegularExpressionProtection>

When I try this in my proxy with this request:

curl -i https://$ORG-$ENV.apigee.net/regexprotection-2/t1 -H content-type:application/json -d '{
  "Example" :  {
    "parent" : {
      "D": "auto-123"
    }
  }
}'

The check passes. When I try it with this request:

curl -i https://$ORG-$ENV.apigee.net/regexprotection-2/t1 -H content-type:application/json -d '{
  "Example" :  {
    "parent" : {
      "D": "auto-^%1"
    }
  }
}'

..the regex policy throws a fault, as desired.

See attached for a working example.

apiproxy-regexprotection-2.zip

In summary: I think you need to study up on regular expressions. You need to be more effective in translating your desires into regular expressions.

Hi @Dino-at-Google

Apologies for late respond and for your inconvenience .

I could not explain my point in previous posts,So posting my trouble track .


Requirement :-

In my json I have some parameter which is mandatory and some are optional,So I have requirement to create RegularThreatProtection Policy for all the fields either it is mandatory or optional parameter for json body part.

Also all the fields available in json are having different pattern.


RegularExpressionThreatProtection policy in my Proxy:-

<JSONPayload>

<JSONPath>

<Expression>$.Example.parent.A</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.B</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.C</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.D</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.E</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.F</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

</JSONPayload>

NOTE:- kindly ignore regex pattern in above sample now .

In my previous post Scenario 1::-

1) except parameter "A"(Mandatory Parameter), all parameters are optional in json.

2)In Sample 1- Parameter "D" is not regex compliant, so in this case ideally it should throw Regex Threat exception but it's not throwing.

For Sample 1- What I observed is-

i) In the request parameter "C" was not passed.

ii) Regex Policy in my Proxy was having the validation for Parameter "C" which is at the top in the sequence before "D". This resulted not validating the Parameter "D" though the regex Policy has the validation for "D" too.

So for this scenario, I have below questions-

1) does the sequence matters for validation ?

2) Is it mandatory to send all the parameters that have the validation in the Reg

In Scenario 2: :-

1)It is throwing exception when I am sending all the parameter in json request which is available in above (top) sequence of "D" in policy i.e A,B,C in my policy jsonpayload .

NOTE :- In my case , json request sequece of fields order is not considering i.e 'A,B,C' can be anywhere in request if available in top of 'D' in policy's <JSONPAyload> tag .

For reference I am attaching comparative behavior in tabular format here.

Kindly verify and suggest.


RequestRegularExpressionThreat Policy JSONPAyload tag entry in policy

Scenario 1:

{

"Example":

{

"parent": {

"A": "2019-07-09-09.29.35.182000",

"B":"1213",

"D":"Auto%-123" ,

"E":"Y"

}

}


Note:

1) In the above request “C” parameter is missing.

And “C” is not present anywhere in the request.

Here for this scenario Regex Threat Exception was not thrown.

For this Scenario my Observation is- In the request “C” is not present, but is validated in the Policy. Ideally even though “C” is not present, it should validate “D” but it is passing the validation for “D”.
<JSONPayload>

<JSONPath>

<Expression>$.Example.parent.A</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.B</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.C</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.D</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.E</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

<JSONPath>

<Expression>$.Example.parent.F</Expression>

<Pattern>myregexp</Pattern>

</JSONPath>

</JSONPayload>

Scenario 2:

{

"Example":{

"parent":{

"A": "2019-07-09-09.29.35.182000",

"B":"1213",

"D":"Auto%-123" ,

"C": "51280382",

"E":"Y"

}}

Note: ”C” is available in the request, and the Policy is validating the Parameters successfully(Observation- As all the Parameters which is also placed at top of “D” <JSONPath> tag are present in the request).

Same Policy sequence and validation pattern using for Scenario 2 .

Hope ,this might be helpful to explain my issue now .

Thanks in advance .

Thanks and Regards.

APIGEE has this issue with RegexProtection policy since a long time now. 

We have connected with them as well and its a known bug in the product (if you write all your logic in 1 single file).

To solve your issue 
1. Write a regex policy containing all mandatory fields. Check schema validation before this step.

2. Write individual policies for each optional field for regex validation. Though this is a time taking activity, it will give you correct result and same performance (compare to writing logic in a single file)

OR use a java script code for regex matching.

Hope this solves your issue.

Can you please elaborate on "this issue"? I'm not clear. You wrote:

APIGEE has this issue with RegexProtection policy since a long time now.

What is the specific issue?

You also wrote

We have connected with them as well and its a known bug in the product

Can you please cite the bug identifier? Who specifically did you connect with?

Thank you.

I've tried to understand the problem but I am still not clear. I have not seen any evidence that the policy is behaving contrary to design and documentation.

APIGEE version : Version 4.50.00.00

Policy code

 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<RegularExpressionProtection async="false" continueOnError="false" enabled="true" name="Regular-Expression-Protection-1">
    <DisplayName>Regular Expression Protection-1</DisplayName>
    <Properties/>
    <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
    <JSONPayload>
        <JSONPath>
            <Pattern>(?:[^\d\.])</Pattern>
            <Expression>$.A</Expression>
        </JSONPath>
         <JSONPath>
            <Pattern>(?:[^\d\.])</Pattern>
            <Expression>$.B</Expression>
        </JSONPath>
    </JSONPayload>
    
</RegularExpressionProtection>

 

Sample Payload

Input:

 

{
    "A": "1",
    "B": "1"
}

 

Test Result : No Exception

Input #2

 


{
 "A":"String",   
"B":"String"
}

 

Result:

 

{
    "fault": {
        "faultstring": "Regular Expression Threat Detected in Regular-Expression-Protection-1: regex: (?:[^\\d\\.]) input: String",
        "detail": {
            "errorcode": "steps.regexprotection.ThreatDetected"
        }
    }
}

 

Input #3

 


{
"B":"String"
}

 

Test Result : No Exception

We discussed this issue with APIGEE support team and they acknowledged this issue. However we fixed the issue by following steps:

1. Extract all variables

2. User Regex validation using variable tags rather jasonpayload


I can provide my email if you want to setup a meeting for demo for this particular issue.

Ahhhhh, thank you for your persistence in explaining this to me. 

I now see the same results you reported. Specifically, when there are multiple JSONPath elements within a JSONPayload, and one of the JSONPath expressions does not resolve to any element in the given JSON payload, then subsequent checks for other JSONPath elements are skipped. This is incorrect behavior. The expected behavior is that all the checks are performed for any JSONPath in the policy that resolves to an element in the payload. 

Based on your description I found a previously filed bug (internal reference b/78106145) that reported this problem. Sadly, this bug was not connected to an active customer report, and so it was never prioritized, and sat in the backlog for a good long time. 

We are now working on fixing this problem. 

 

Thanks.

We reported it in one of the ticket. Sadly i cannot confirm it right now as I have changed project but it was discussed with one of the support team member. He suggested to use individual regex policies for each non mandatory field. I found that solution painful too. Extracting each field and then validating using a single regex policy using variables tag made more sense to me. Easy to maintain code.