inconsistent behavior of regular expression protection policy when uploading .xlsx and .pdf files in a POST request

RegularExpressionProtection policy has inconsistent fault handling when uploading .xlsx and .pdf as multipart/form-data, the problem is regular expression protection policy is showing inconsistent behavior for same content between different .xlsx documents.

I have a spreadsheet which has 5 pages within it, due to one page content the regex policy is getting triggered, if I take the same content and paste it in a whole new spreadsheet regex policy does not complain.

Also when trying to upload .pdf files regex policy gets triggered. My understanding is the regex policy gets executed on actual text content sent in the POST body. In the .xlsx and .pdf files case it is all hexadecimal. Does the policy parses the hex and executes on the actual file content ? according to my testing that is also not the case.

Please let me know how does regex policy handle .xlsx and .pdf files?

0 2 111
2 REPLIES 2

I don't know how the regex protection policy handles binary files, but I Would think you'd want to design your API Proxy to NOT scan binary files with a regex.

As for differing behavior when you paste content from one .xslx to another.... the .xlsx document standard is pretty baroque, and just because you paste the same sheet content from one doc to another, does not mean the same .xslx will result. It's possible the .xslx has change history, comments, origin, timestamps.... None of that will be the same if you copy/paste.

I think you need to reconsider your plan of using regex against XSLX and PDF , unless you understand in detail the document standards for both of those doc types, and can design regex that handle the possible threats that might lurk in such documents.

Possibly a better approach is to send those documents to a virus scanning service, from Apigee.

Sure thanks, let me check for other options.