Regex Protection Policy (XPath) works only with first attribute in payload

Hi everyone (and @dchiesa1)!

I tried to make a Regular Expression Protection policy that will reject any special symbols in the XML payload, but for some reason, it works as expected with values inside tags (like <tag>some-value</tag>). However the same regex does not trigger with attributes, or, to be more precise, it ignores all attributes except the first one (in alphabetical order). 
Here's my configuration of policy:

 

<RegularExpressionProtection async="false" continueOnError="false" enabled="true" name="rep_CheckForXPathVulnerabilities">
   <DisplayName>rep_CheckForXPathVulnerabilities</DisplayName>
   <Properties/>
   <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>

   <Source>request</Source>
   <XMLPayload>
      <XPath>
         <Pattern ignoreCase="true">[\'\"/@=\[\]\(\)]</Pattern>
         <Expression>//*/@*</Expression>
         <Type>string</Type>
      </XPath>
      <XPath>
         <Pattern ignoreCase="false">[\'\"/@=\[\]\(\)]</Pattern>
         <Expression>//*</Expression>
         <Type>string</Type>
      </XPath>
   </XMLPayload>
</RegularExpressionProtection>

 

When I try to execute a request with this content:

 

<root action="***" class="***" msgtime="***" ltq="***" exactmatch="***" name="***" orderby="***" page="***" la="some@inacceptable@text" >Inacceptable@text</root>

 

REP policy throws an error because of "Inacceptable@text". But if I remove "@" from that text, request works fine (although I have the same character inside attribute "la"). Also, if I change the name of attribute "la" to "aala" (so it'll be the first one in alphabetical order), the policy will trigger on "@" in its value (as expected).

Does anybody know what I'm doing wrong? Any help will be appreciated!

P.S. And a general question: we have quite a lot of different rules we need to cover to handle our security demands (currently we're working on JSONPath and XPath Injection protection); Due to Apigee best practices how should we implement this: by using this Regex Protection policy or using custom logic within separate JS files? (Cause this REP policy has quite limited functionality). 

Solved Solved
0 2 480
1 ACCEPTED SOLUTION

That sure looks like a bug to me. Let me investigate.

I jumped to a conclusion there without looking carefully.  Let me correct myself.

Because you specified "string" as the value for the Type element, you got a single string back from the XPath query. It just so happens that the string you got back is the value of the first attribute, when the attribute names are sorted lexicographically. That is not a defined behavior of Apigee, or of XPath, but that is what you're seeing.  

In any case, if you want to check ALL attributes, and not only the first attribute, then you should use nodeset as the Type.  More generally, in any case in which your xpath is likely to return more than one value (either elements or attributes), you should probably ask for a nodeset in response.

Try this as your configuration:

 

 

 

  <XMLPayload>
      <XPath>
         <Pattern ignoreCase="true">[\'\"/@=\[\]\(\)]</Pattern>
         <Expression>//*/@*</Expression>
         <Type>nodeset</Type> <!-- important -->
      </XPath>

 

 

 

 In my tests, if I use "string" it checks only the first attribute. If I use "nodeset" it checks all attributes. 

The Type element is documented, but sadly, the documentation doesn't explain this behavior.  I'll get the documentation updated.

Let me know if this solves the mystery for you.


regarding your second, general question

we have quite a lot of different rules we need to cover to handle our security demands (currently we're working on JSONPath and XPath Injection protection); Due to Apigee best practices how should we implement this: by using this Regex Protection policy or using custom logic within separate JS files? (Cause this REP policy has quite limited functionality).

It's hard for me to give a definitive answer on that, without knowing what you mean by "quite a lot of different rules". As you know, you can specify multiple regex patterns for each Xpath query. So if your "different rules" can be elegantly expressed as regular expressions, then perhaps the RegExProtection policy is a good tool for the job.  

I am interested in understanding what you mean by "this REP policy has quite limited functionality."  Supposing your payload is XML, the REP policy is designed to: 

  • allow you to specify a set of XPath queries, each one of which may resolve to a nodeset
  • at runtime, for each xpath query
    • evaluate the xpath against the payload and get a nodeset
    • for each node in the set
      • for each pattern
        • check the node for a match against the regex pattern

And if any node, from any query, matches any pattern for that query, then.... throw a fault.  That may not address your requirements, but it seems like a powerful tool that covers lots of cases. I am curious to understand more about your "quite limited functionality" viewpoint. 

View solution in original post

2 REPLIES 2

That sure looks like a bug to me. Let me investigate.

I jumped to a conclusion there without looking carefully.  Let me correct myself.

Because you specified "string" as the value for the Type element, you got a single string back from the XPath query. It just so happens that the string you got back is the value of the first attribute, when the attribute names are sorted lexicographically. That is not a defined behavior of Apigee, or of XPath, but that is what you're seeing.  

In any case, if you want to check ALL attributes, and not only the first attribute, then you should use nodeset as the Type.  More generally, in any case in which your xpath is likely to return more than one value (either elements or attributes), you should probably ask for a nodeset in response.

Try this as your configuration:

 

 

 

  <XMLPayload>
      <XPath>
         <Pattern ignoreCase="true">[\'\"/@=\[\]\(\)]</Pattern>
         <Expression>//*/@*</Expression>
         <Type>nodeset</Type> <!-- important -->
      </XPath>

 

 

 

 In my tests, if I use "string" it checks only the first attribute. If I use "nodeset" it checks all attributes. 

The Type element is documented, but sadly, the documentation doesn't explain this behavior.  I'll get the documentation updated.

Let me know if this solves the mystery for you.


regarding your second, general question

we have quite a lot of different rules we need to cover to handle our security demands (currently we're working on JSONPath and XPath Injection protection); Due to Apigee best practices how should we implement this: by using this Regex Protection policy or using custom logic within separate JS files? (Cause this REP policy has quite limited functionality).

It's hard for me to give a definitive answer on that, without knowing what you mean by "quite a lot of different rules". As you know, you can specify multiple regex patterns for each Xpath query. So if your "different rules" can be elegantly expressed as regular expressions, then perhaps the RegExProtection policy is a good tool for the job.  

I am interested in understanding what you mean by "this REP policy has quite limited functionality."  Supposing your payload is XML, the REP policy is designed to: 

  • allow you to specify a set of XPath queries, each one of which may resolve to a nodeset
  • at runtime, for each xpath query
    • evaluate the xpath against the payload and get a nodeset
    • for each node in the set
      • for each pattern
        • check the node for a match against the regex pattern

And if any node, from any query, matches any pattern for that query, then.... throw a fault.  That may not address your requirements, but it seems like a powerful tool that covers lots of cases. I am curious to understand more about your "quite limited functionality" viewpoint. 

Thank you very much for your help!
Regarding the limited functionality, probably I was wrong. It's just... I suppose it doesn't suit us in this case. For now, our main task is to disable some special characters ( like " ' [ ] ( ) \ /) in all requests and all fields except "msgtime" and a few more. And I'm not sure how to implement that using the Regex Protection policy. So far my only idea is to write a complex condition like "//*/@*[local-name() != 'datechanged' and local-name() != 'msgtime' and *other attributes*". But if I have too many fields to ignore, this condition will be awful. So I'm thinking about using a javascript policy instead. 

As for those "different rules", our main problem is that we already have a quite big proxy that receives a lot of different requests with different structures, so it's hard to write any general handlers for those requests.

And I have one more question. If I need to do a similar validation on JSON, how can I check all the values ​​of JSON parameters? Like, the $.* pattern returns an array of all values ​​directly under the root, including objects. But I need to extract all values except objects (in fact, only strings) and validate them. Is there a way to do this?