Solved: Re: VerifyJWT JWKS Uri Caching

bullenjohnmicha · 03-24-2021 07:35 AM

I'm planning to use the VerifyJWT policy and use the uri for caching of the JWKS. I haven't seen how caching is implemented except for the fact that the cache retains the JWKS for 300 seconds.

How does it behave if it encounters an unknown kid in the JWT header? Would it immediately fetch the JWKS and refresh the cache?

dchiesa1

Nope, the cache management is fairly simple. It does not cache based on kid. It caches the entire JWKS using the JWKS URI as the cache key.

The assumptions behind the JWKS cache is

JWKS content is small
keys change slowly
new keys get added to the JWKS before there are JWT signed with those keys
old keys get removed only AFTER the expiration of the last JWT signed with that key

If you have the case in which you are introducing a new key, signing JWT with that new key, and updating the JWKS endpoint all within 5 minutes, then the cache assumptions won't hold.

To avoid this

add new keys to the JWKS before you begin using the keys
remove old keys from your JWKS lazily

There is an outstanding feature request to allow the caching to respect the HTTP caching headers. That seems reasonable. The practical effect of that will most likely be to have a LONGER TTL for the cached JWKS.

View solution in original post

dchiesa1

Nope, the cache management is fairly simple. It does not cache based on kid. It caches the entire JWKS using the JWKS URI as the cache key.

The assumptions behind the JWKS cache is

JWKS content is small
keys change slowly
new keys get added to the JWKS before there are JWT signed with those keys
old keys get removed only AFTER the expiration of the last JWT signed with that key

If you have the case in which you are introducing a new key, signing JWT with that new key, and updating the JWKS endpoint all within 5 minutes, then the cache assumptions won't hold.

To avoid this

add new keys to the JWKS before you begin using the keys
remove old keys from your JWKS lazily

There is an outstanding feature request to allow the caching to respect the HTTP caching headers. That seems reasonable. The practical effect of that will most likely be to have a LONGER TTL for the cached JWKS.

bullenjohnmicha

Thank you @dchiesa1 !

Mohit_Baveja

@dchiesa1 There is an outstanding feature request to allow the caching to respect the HTTP caching headers. That seems reasonable. The practical effect of that will most likely be to have a LONGER TTL for the cached JWKS.

Is there a ticket on this or any idea if this has been implemented ? We are looking for longer TTL for cached JWKS.

dchiesa1

Is there a ticket on this or any idea if this has been implemented ?

I don't believe it's been implemented. The internal bug reference is b/178045481

We are looking for longer TTL for cached JWKS.

Why? Why does it matter to you? Can you explain the impact?

Mohit_Baveja

We are using external JWKS reference in the Verify JWT policy. Sometimes, the policy execution is taking longer than 200ms, hence looking options to optimize the performance.

Customizing the JWKS URI caching time would help us achieve the performance.

dchiesa1

I see. If the 200ms elapsed time includes JWKS retrieval, I can see why you would want to extend the TTL.

Aside from the ticket tracking the desire to extend the TTL, there's a separate bug to de-couple the JWKS retrieval from the policy execution. In theory that will allow you to just _avoid_ the latency associated to retrieval, since it would happen independently of the policy execution. But I am not sure if that will be implemented.

Another way to work around this long-latency problem is to wrap the JWKS endpoint in a separate API proxy, which simply employs the response cache. Then the proxy calling VerifyJWT would use ... a separate Apige-managed endpoint for the JWKS source, and THAT proxy would cache for as long as you like. 24h or whatever. Presumably that jwks-wrapping proxy would return much more quickly than 200ms. That would help you achieve the optimization you are seeking.

Mohit_Baveja

Thanks @dchiesa1 . I will definitely try this out. I

If we avoid the latency associated with the retrieval, how the token would be validated? JWKS is required to first generate the public key that validates the token correct? Is there a gap in understanding.

Could we also store the JWKS in property set and refer it from policy instead of calling the JWKS URI endpoint.

dchiesa1

If we avoid the latency associated with the retrieval, how the token would be validated? JWKS is required to first generate the public key that validates the token correct? Is there a gap in understanding.

the builtin refresh period is 300 seconds in the JWT policy. If the roundtrip to the JWKS endpoint from the Apigee runtime consumes a substantial portion of the 200ms latency you said you observed, then,... one way to mitigate that is to connect to the JWKS endpoint less often.

A good way to do that is to introduce a caching proxy between the Apiege runtime and the JWKS endpoint. You can do that with an API Proxy that connects to the actual JWKS endpoint, and uses the ResponseCache policy to cache for 24 hours or whatever. The VerifyJWT then points to the Apigee proxy endpoint. VerifyJWT will still poll the JWKS endpoint every 300 seconds, but , now the JWKS endpoint is an API Proxy. We presume that the API Proxy will respond more quickly than 200ms, because the data is cached and the network hop is small.

So you still have the JWKS and public key. It's just that you're reaching out over the network less often.

Could we also store the JWKS in property set and refer it from policy instead of calling the JWKS URI endpoint.

You COULD do that, yes. The problem is the JWKS is liable to change, and you don't want to have to undeploy-modify-then-redeploy the proxy when that change occurs. Hosting the JWKS on an https-accessible endpoint, and referencing the endpoint from your verifying logic (VerifyJWT policy in Apigee in this case) allows your system to adapt to arbitrary changes in the keyset.