Edge not applying policies consistently

stark-todd · 04-01-2020 01:43 PM

Hello,

We have a proxy that is behaving oddly. Sometimes, policies get executed as we'd expect them to, and other times they don't. Nothing about the policy or proxy will change between these failures. For example, we're using a JSON threat protection policy. When running the integration tests on this policy, sometimes the ArrayElementCount test will fail, and other times it will pass. Running a trace on the failed tests shows that all the conditions we'd expect to make the policy raise a fault are present, but it's just not doing so. It's not just the JSON threat protection policy having this issue, it's just the most recent example. We can't seem to find a pattern or a reason for this behavior.

Has anyone encountered a problem like this, and if so, how did you resolve it?

Thanks.

dchiesa1

Hi Todd

I'm sorry you're having problems. That sounds frustrating.

I haven't seen anything like that.

That's very confusing. If you're unable to track it down and eliminate that confusing behavior, then I'd suggest opening a support ticket with Apigee to get some assistance. They have access to logs and other diagnostic information which may help clarify the situation.

If in the course of your own investigation, you can produce a test case that reliably reproduces erratic behavior of the sort you've described, I'd like to see it. This would be a proxy, with a standard request, and you send it 10 times and 1 or 2 (or more?) times it behaves one way, and other times it behaves a different way.

That can happen when caching is happening, and sometimes the effect of caching can be surprising. Simplifying the test case to narrow it down to ... just the JSON Threat protection policy... or just the one policy that appears to be behaving inconsistently in yoru environment, will be helpful in diagnosing.

let me know if you can produce a test case.

stark-todd

Thanks for your response, Dino. I'll look into caching on this proxy. If I'm able to reliably reproduce the issue, I'll let you know.

mark_hammelman

Hi Dino, I work with Todd and have submitted support ticket 1480442 for this issue.

Using a cucumber test, I set content-type to 'application/json' and then it sends the same payload 10 times. The json threat protection policy detected a problem 7 times but allowed 3 of the requests to go on through.

Its very strange!

dknezic

You should enable trace on your proxy

mark_hammelman

Alright, I think I have an understanding of why this is happening.

Some context is required to explain:

Our clients get OAuth tokens from an external Oauth provider. Whenever the proxy receives a request, it checks Apigee's token vault to see if the token exists.

If the token does not exist in Apigee's vault, it will make a service-callout to our token provider to validate the token.
If the token is valid, it uses the OAuthV2 policy to create a replica of the token in Apigee's token vault, so on subsequent requests with the same token, it does not need to make the service-callout..

There is an undesired "feature" of the OAuthV2 policy when you use it to generate access tokens, which is that it will set the request's "Content-Type" header to "x-www-form-urlencoded". I raised a ticket about this last year.

You can send a "GET" request without a Content-Type header and no payload, but after OAuthV2 generates an apigee token, then you have a "GET" request with a "Content-Type:x-www-form-urlencoded" header. To fix this problem, we added AssignMessage steps to take a backup of the original request object, then generate the token, then restore the original request object.

So it turns out the reason for our intermittent results goes back to our cucumber test.
It was wrapping the value in quotes, so it was sending 'Content-Type:"application/json" '.

(The quotes don't show up in an Apigee trace though)

On a new token, it was executing our full TokenValidation process, and it seems that the step that "restores" the request object also removes the quotes from the Content-Type, so then the Json threat protection policy sees it as a valid content-type and enforces the limits.

On cached tokens, the JSON threat protection policy does not enforce the limits because the 'Content-type:"application/json" ' is not the mandatory 'Content-type:application/json'.

kurtkanaskie

Ah,

I looked at the trace sent to the support ticket and did notice that when the token was valid, the Threat Protection DID NOT detect the error, only when the token validation shared flow executed which did the save and restore of the original message.

So some good tidbits here,

OAuthV2 policy behavior, I didn't see that in a similar design which uses client_credentials and Authorization basic, thus I only need to save and restore that header. So in my case in always skipped Threat Protection with "application/json".
Trace bad behavior, not showing quotes, even in the trace.xml download

Thanks for the detailed followup!

dknezic

I've tried sending 1000's of requests to a proxy with a json threat protection policy using a payload to trigger the ArrayElementCount validation, and wasn't able to reproduce what you're seeing. It behaves consistently.

What conditions have you configured in your proxy? Are you always sending the same payload and headers to your api?

stark-todd

That's what's frustrating about this case. We've used this policy on nearly all of our proxies and it's never behaved like this before.

kurtkanaskie

Just a thought, I know this just applies to the JSON threat protection situation, but are you sure the Content-Type header is set to "application/json"?
Are you sure you are sending the same message?
Have the integration tests changed?

Another thing you could do is to narrow the situation down to a specific region and / or message processor by capturing the "system.region.name", "system.uuid" and "messageid" for each request. You can do this using an "debug" Assign Message and AssignVariable elements.

stark-todd

Thanks for your thoughts. We're definitely setting the Content-Type header to application/json, I found an answer to another question where that was causing the issue, so I explicitly checked for that. The integration tests haven't changed either, and I know they send the same message every time since they're just sending the contents of a file, and that file hasn't changed.

I'll look into the region/message processor variables you mentioned here and see if that narrows it down.

Thanks!