Escaping Chars with RegularExpressionProtection policy

Team,

I need to escape following chars in JSON Pay load [!@#$%^&*(),.?"{}|<>]

I have been successful with the following Policy

<RegularExpressionProtection async="false" continueOnError="false" enabled="true" name="Regular-Expression-Protection-1"> 
  <DisplayName>Regular Expression Protection-1</DisplayName> 
  <Properties/> 
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables> 
  <JSONPayload> 
    <JSONPath> 
      <Expression>$.</Expression> 
      <Pattern ignoreCase="false"><\s*script\b[^>]*>[^<]+<\s*/\s*script\s*></Pattern> 
      <Pattern ignoreCase="false">[!@#$%^*().?|]</Pattern> 
      <Pattern ignoreCase="false">[<]</Pattern> 
      <Pattern ignoreCase="false">[>]</Pattern> 
    </JSONPath> 
  </JSONPayload> 
</RegularExpressionProtection>

But with this i am not able to filter out the char {, }, [ and ].

Can you please advise how to achieve the same.

0 4 827
4 REPLIES 4

Hi, I understand that when you say you want to "escape" the characters, you don't actually want to replace them with escape sequences. Instead what you want to do is throw a fault when the JSON payload includes those characters. Is that right?

If so, I think you have the right idea with your Patterns. To specify a regex pattern that matches the square brackets, you can specify them as the first characters in a range. And backslash them. For curly braces you must also backslash. Try something like this:

  <JSONPayload>
    <JSONPath>
      <Expression>$.</Expression>
      <Pattern ignoreCase="false"><![CDATA[<\s*script\b[^>]*>[^<]+<\s*/\s*script\s*>]]></Pattern>
      <Pattern ignoreCase="false">[\\[\\]\\{\\}!@#$%^*().?\\|]</Pattern>
      <Pattern ignoreCase="false">[\u003C\u003E]</Pattern>
    </JSONPath>
  </JSONPayload>

You can also use the unicode sequences to encode the greater than and less than signs, and include them into a single range.

Hi @Dino-at-Google thanks for the response.

Yes, when i say that i wan to escape, i prefer to throw a fault.

Major issue is genuine JSON payloads usually has the chars like { , }, [ and ]. But i like to throw fauly only when they are present like below. Please note the Hello World in the sample payload. It contains the curly brace ({) in it. I like to throw error in this case.

{ "object": { "a": "b", "c": "d", "e": "f" }, "array": [ 1, 2 ], "string": "Hello {World" }

Thanks again for your help.

@dchiesa1 @  how can I escape everything you do not explicitly need

E.g. escape every char except “a-z,A-Z,0-9,-,_,.,,"

I think you mean REJECT every character other than a-z, A-Z, etc. In the context of the original question, :Escape: meant, how can I refer to those special characters in a regular expression. And a good way to do that is to use unicode sequences.

But, you are asking something different. I think you are asking, how do I get a regular expression that matches "anything except a known-good range of characters"?

This is a regex question, not an Apigee question. So you can find the answer in any regex information source. I think what you want is a exclusion range. You express this in regex using square brackets and specifying the caret (^) as the first character, followed by the characters outside the range. Therefore [^a-z] matches anything that does not fall in the range a-z. And [^a-zA-Z0-9] matches anything that is not a-z, and is not A-Z, and is also not 0-9. A string like foo_bar would match, because the underscore character falls outside of all those ranges.  A string like foobar would not match that regex. "foobar" has no characters that fall outside the excluded range. 

You can use an online tool like this one to test regular expressions dynamically.  There are slight differences in regex engines, mostly around edge cases or special flags. If you use a tool that allows you to select the engine (Python, PCRE, etc), choose Java. Apigee uses the Java Regex engine. For the case you're talking about here - a  simple exclusion range - all of the major regex engines behave the same way.