Refreshing with oAuth refresh tokens isn't atomic?

mstellinga · 03-23-2018 11:46 AM

We have set up Apigee to manage access tokens and refresh tokens for our apps.

We have an endpoint that generates a new access token and refresh token from an existing refresh token. This is done using the oAuthv2 policy. In the flow, after this policy, there's also a callout policy to register some information in our backend, and then we return the new refresh and access token to the app.

Unfortunately, in about .5% of the cases we see refresh failures. This appears to be because the app tries to do a refresh with an old token.

If an error occurs in the flow after the oAuthV2 policy (for example an error on the callout) or a connection error occurs during writing the response, the refresh token never reaches the app and the next refresh call fails.

Essentially, because the refresh is done in a policy, the endpoint isn't atomic, and refreshes with a new refresh token should be.

Is there some way to undo the refresh token generation in a fault flow? Or some setting that will undo the refresh on errors in writing the response?

We're currently using an on-premise installation.

DChiesa

I think I understand the situation. I see the problem you are describing: that the endpoint itself is not atomic. The OAuthV2 policy is atomic, but the endpoint is not. This is a problem in any distributed system based on HTTP.

Apigee Edge is an HTTP Proxy, and it's predicated on the idea of a synchronous HTTP connection. If that connection breaks or fails, then the response is not delivered and the two different parties - the client and the server (the Apigee toke store) have a different view of the world. The Apigee token store thinks the refresh token "generation N" is used and gone, and that there is a new access_token and refresh_token (generation N+1) available for use . The client ... does not know about the refresh_token N+1.

First, make sure your connection is as reliable as possible. What failures cause the problem? You mentioned a callout. Maybe you can enqueue that call, or make it asynchronous (so that it does not affect the API proxy) or otherwise increase its independence or reliability.

To answer your question, there is no way to "undo" the Operation = RefreshAccessToken.

However, there is an approach that might mitigate the problem. The "ReuseRefreshToken" element in the OAuthV2 policy tells the policy to just keep the refresh_token, even when generating a new access token. This means the refresh will be idempotent, if not atomic. When the app calls for refresh, it will get access_token generation N+1, but it will retain refresh_token generation 1. For ever and ever.

See here for documentation of the element.

HT to @Chris Svee .

mstellinga

We're looking at making the callout asynchronous, but our error analytics show that that callout is only part of the problem. I think a large part is phones losing their connection, but I can't be sure - we're still investigating and trying to reproduce it in a test situation.

Reusing the refresh token is also an option, but that has security implications. Currently, if the refresh token is stolen, that's only a problem until the next refresh. If the refresh token is valid forever and it's stolen, that's much more problematic. We'll need to add extra extra security measures (pin in to the device ID or something similar).

So I was hoping there was perhaps a third option, like an undo, but I might be out of luck 🙂

apigee-4

Hi Guys,

We have the same problem and are in contact with support couple of months now to address the issue.

In our case it is the client connection that is the problem (mobile phone, e.g. mobile connection, which is out of our control).

When the client does not get the new token-set, it fails to continue to work and the end-user Must login again.

We hoped to configure somewhere how many times Max. a refresh token may be used. (E.g. maximum of 2 times using a single refresh token.), or something that resets the refresh time of the first refresh token, to be a maximum of 1 hour (to retry) and have another length for the new refresh token.

So far, we do not have a proper solution to mitigate the issue.

Having the same refresh token for the complete "session" (ReuseRefreshToken) has to much security implications for us.

Regards,

Gijs