OAuthV2: Intermittent approval of previous access_token after refresh token gets called

Not applicable

Hi, I noticed some erratic behavior access_token after refresh token gets called. To explain it, here is the scenario:

1. I have one proxy e.g. proxy-a which has oauthv2 verify access token policy.

2. I generated one access_token (using password grant type)and used with this proxy in as Authorization header. which works as expected. I could see in trace that oauthv2 successfully passed.

3. Although access_token received from step 2 is valid i.e not expired yet. I called refreshtoken proxy to get another access_token by passing refresh_token received from step 2 above.

4. Used the new access_token obtained from step3, called my proxy again. Which works fine.

5. Now i tried to call the proxy-a with access_token which i had from step2 (it's shouldn't be expired yet because i am doing everything here in max 5 minute duration and access_token is valid for an hour ), now i experience some erratic behavior. Some time my old access_token works OK and i get response and some time oauthv2 policy fails with token. It works and fails intermittently. Then i waited for almost 10 minutes and then all calls starts failing at oauthv2 policy.

What i fail to understand is: after calling refresh token to get new access_token, does old access_token supposed to be revoked? If Yes, then why it is working intermittently. What is expected behavior.

Thanks,

Prabhat

1 3 310
3 REPLIES 3

The reason you see intermittent behavior: There is a cache, and it is per-server.

The Edge service is "serverless", from your point of view. You don't configure servers, you don't worry about servers, though there are actual discrete servers running the code behind OAuthV2/VerifyAccessToken.

What happens is this:

in step 2, the server that receives your request with access_token (let's call it "generation 1" = g1) reads from the token store and caches the result. Because the g1 token is valid and not expired, VerifyAccessToken succeeds. The cache now contains that token .

If you were to present the g1 token again for verification VAT would read directly from cache, and would see that the token is valid and not expired.

In step 3, you use RefreshAccessToken. This delivers a new access token (g2) and invalidates the previous g1 access token in the token store. Important: it does not invalidate the token cache.

Presenting g2 token to VAT, it succeeds.

Presenting g1 token to VAT, it may succeed or fail, depending on which server handles the request. If a server that has previously seen the token handles the request, it will read from cache, and if the cache is not expired (usually 180 seconds) , then the token will be treated as valid. If the cache has expired (180 s after first sight of the token) , then the server will read the persistent token store and see that the token is no longer valid. If the server has never before seen the token, then the cache is empty and the server will likewise read the token store and return a result indicating that the g1 token is not valid.

Does this make sense to you?

I think it might be nice if the RefreshAccessToken, if done on the same server, would invalidate the server-local cache of the existing (g1) token. This would eliminate *some* of the "false approvals" when the token is verified by a server with a stale cache. But it would not eliminate all of them. Imagine there are three servers, and the cache is warm on 2 of them. You invoke a proxy with RefreshAccessToken on one of the 2.

With the current behavior, I believe both of the 2 servers that had previously seen the token will have a stale cache, and will validate the g1 token for the next 180s (or less)

With the proposed modified behavior, the server that handles the RefreshAccessToken would have a clean cache, and would treat the g1 token as invalid (Correctly). The other of the 2 servers would have a stale cache, and would treat the g1 token as valid for the next 180s (or less).

Therefore this imagined change does not solve the problem completely. For that reason I think it's probably not worth making the change.

You might think it would be nice if the cache were completely invisible - that when calling RefreshAccessToken on one of the servers, all of the servers would have their caches invalidated for that particular token. That's a good goal, but for scalability reasons, it hasn't been possible. Imagine not 3 servers but 40. The token invalidation notifications can be messy. In such a scenario the cache will have a window of staleness.

I hope this clarifies.

Thanks Dino. I will keep my comment in two parts.

Part 1: My comment on your response

Your explanation makes sense.

However I am thinking could it pose security risk as token invalidated could still work for 180 sec or less in the scenario we discussed above? Expectation was: client must not be able to use access token if it is invalidated but as per the explanation, client still could use it and their response could succeed (if not all, but few within 180 sec time frame of token first presented)

My understanding on this design:

If i understand correctly, sole purpose of putting token in cache is faster retrieval while verification of token and it works for each 3 minutes cycle. For example on server1 i.e. s1: token g1 presented first time, cached for 3 minutes. after 3 minutes if g1 presented again on s1, not found in cache, validated from token DB and put in cache for 3 minutes again. Is that correct?

If yes, is there any way to opt out and gets validated with token DB all the time rather with cache, I understand it could add some latency, however i could choose few milliseconds of latency over possible false approvals of token which could be misused by client?

Part 2: Question lead from your above explanation.

Lookup cache and Populate Cache Policy

This leads to another question: Does custom cache (which my proxy has created for populateCache and lookupCache policy to keep some cached data), will also works in same way?

Consider a scenario:

If i have two proxy p1 and p2 deployed in same environment. Proxy p1 contains populate-cache-policy and p2 contains lookup-cache policy and both policies pointing to same cache resource. First request made to proxy p1 (which contain populate-cache) and after successful response from p1, second request made to p2 (which has lookup-cache-policy)

Is there any possibility that p2-:lookup-cache may not find the key even if p1:populate-cache was successful in previous request because request which came for p1- populate cache on server 1 and request which came for p2-lookup cache on server2?

There is no way to "turn off" the cache.

The cache that you manipulate directly does not behave this way, because there is cache propagation for that. You can test that yourself.