Solved: Re: Key Value Maps and Caches

abhijithsh · 08-22-2022 09:52 AM

Asking this questions since there were no documentation where I could find the details I was looking for.

I faced the issue of getting different value while reading from a KVM when the KVM was updated. Looking further I came not know that this could be something related to cache. Then I updated the <ExpiryTimeInSecs> to 1, which solved the issue but read that this will have impact on performance. I have a few doubts as described below.

1. If I have two policies used to read from KVM. These two policies are in two proxies and policies have different names and <ExpiryTimeInSecs>. Will there be two caches created since the Expiry is different, or just one cache per KVM ?

2. Lets consider that, there is only one cache. I had seen in the documentation that a PUT operation from KVM operations policy will refresh the cache. In a scenario where KVM is being updated regularly, updating the cache by a GET operation is not required ? Right ? Because PUT operation itself will refresh it promptly.

3. If there are multiple caches being created. Will all of them be refreshed when a PUT operation happen. (Question valid only there are multiple caches).

Thanks in advance. I am still going through the old answers and details I could find. But none actually gives a concrete idea on how the cache is being managed by Apigee and all the possible options to control it. @dchiesa1 @API-Evangelist

dchiesa1

There are two ways to update and read a KVM: via the administrative API, or with the policy. If you read / write to the KVM with the KeyValueMapOperations policy, the runtime will update an in-memory cache local to the node. The cached entry will remain in cache until the <ExpiryTimeInSecs> that was specified in the policy elapses, OR until you overwrite the cache with a PUT operation in the policy.

If you write/update the KVM with the Administrative API, it does not affect any runtime cache. Updating or over-writing a value in the KVM via the administrative API will not invalidate any values that are present in the runtime cache. The values in the runtime nodes will get invalidated after the Timt-to-live elapses, that is, after the number of seconds specified in <ExpiryTimeInSecs> from the KeyValueMapOperations policy that populated the cache. OR, cached values will get invalidated if you use a PUT operation in a KeyValueMapOperations policy.

1. If I have two policies used to read from KVM. These two policies are in two proxies and policies have different names and <ExpiryTimeInSecs>. Will there be two caches created since the Expiry is different, or just one cache per KVM ?

The cache is kept in memory in the node that services the request. There may be many distributed nodes in general, serving requests. The cache is not distributed across all of those nodes. So: yes, each different node has a different cache, with potentially different entries. Suppose you have this sequence of events:

write entry e1 with value v1 via admin API.
within the context of an API request, read entry e1 via KeyValueMapOperations policy. Suppose <ExpiryTimeInSecs> is 300, and suppose this API request is handled by runtime node 1. The value read and cached is v1.
write entry e1 with value v2 via admin API.
within the context of a second API request, read entry e1 via KeyValueMapOperations policy. Suppose again <ExpiryTimeInSecs> is 300, and suppose this API request is handled by runtime node 2. The value read and cached is v2.
within the context of a third API request, read entry e1 via KeyValueMapOperations policy. Suppose again <ExpiryTimeInSecs> is 300, and suppose this API request is handled by runtime node 1. If the time of this event is not 300 second after the time of event #2, then the value read and cached is v1.

I had seen in the documentation that a PUT operation from KVM operations policy will refresh the cache. In a scenario where KVM is being updated regularly, updating the cache by a GET operation is not required ? Right ? Because PUT operation itself will refresh it promptly.

Yes, a PUT operation from within the KeyValueMapOperation policy also has an <ExpiryTimeInSecs> element. It will write to persistent store, and also write an entry to the cache, with the given Time-to-live (TTL).

If there are multiple caches being created. Will all of them be refreshed when a PUT operation happen. (Question valid only there are multiple caches).

No, each cache is local to the runtime node. There is no distributed maintenance of the cache.

View solution in original post

dchiesa1

There are two ways to update and read a KVM: via the administrative API, or with the policy. If you read / write to the KVM with the KeyValueMapOperations policy, the runtime will update an in-memory cache local to the node. The cached entry will remain in cache until the <ExpiryTimeInSecs> that was specified in the policy elapses, OR until you overwrite the cache with a PUT operation in the policy.

If you write/update the KVM with the Administrative API, it does not affect any runtime cache. Updating or over-writing a value in the KVM via the administrative API will not invalidate any values that are present in the runtime cache. The values in the runtime nodes will get invalidated after the Timt-to-live elapses, that is, after the number of seconds specified in <ExpiryTimeInSecs> from the KeyValueMapOperations policy that populated the cache. OR, cached values will get invalidated if you use a PUT operation in a KeyValueMapOperations policy.

1. If I have two policies used to read from KVM. These two policies are in two proxies and policies have different names and <ExpiryTimeInSecs>. Will there be two caches created since the Expiry is different, or just one cache per KVM ?

The cache is kept in memory in the node that services the request. There may be many distributed nodes in general, serving requests. The cache is not distributed across all of those nodes. So: yes, each different node has a different cache, with potentially different entries. Suppose you have this sequence of events:

write entry e1 with value v1 via admin API.
within the context of an API request, read entry e1 via KeyValueMapOperations policy. Suppose <ExpiryTimeInSecs> is 300, and suppose this API request is handled by runtime node 1. The value read and cached is v1.
write entry e1 with value v2 via admin API.
within the context of a second API request, read entry e1 via KeyValueMapOperations policy. Suppose again <ExpiryTimeInSecs> is 300, and suppose this API request is handled by runtime node 2. The value read and cached is v2.
within the context of a third API request, read entry e1 via KeyValueMapOperations policy. Suppose again <ExpiryTimeInSecs> is 300, and suppose this API request is handled by runtime node 1. If the time of this event is not 300 second after the time of event #2, then the value read and cached is v1.

I had seen in the documentation that a PUT operation from KVM operations policy will refresh the cache. In a scenario where KVM is being updated regularly, updating the cache by a GET operation is not required ? Right ? Because PUT operation itself will refresh it promptly.

Yes, a PUT operation from within the KeyValueMapOperation policy also has an <ExpiryTimeInSecs> element. It will write to persistent store, and also write an entry to the cache, with the given Time-to-live (TTL).

If there are multiple caches being created. Will all of them be refreshed when a PUT operation happen. (Question valid only there are multiple caches).

No, each cache is local to the runtime node. There is no distributed maintenance of the cache.

abhijithsh

Thanks for the clarification. I guess this was causing the issue in my proxies.

I have a KVM store having a value names 'Token'. This value is updated every 5 minutes. I was getting the previous value stored in the KVM store intermittently for a short time right after the KVM was updated.

If one of the nodes named node1 read the value using a GET and saved to cache1 and immediately or a little bit later KVM got updated by a PUT operation in KVM operations policy. In this case the proxies reading from the cache1 is still providing the old value.

I have set the <ExpiryTimeInSecs> to 1 second to reduce the time which the cache is keeping the old value. However, this is not a solution. Also, this creates lot of performance overhead.

1. Is there any way we can sync the caches in multiple nodes without performance impact ?

2. Is there any way we can make sure that only one cache is getting created ?

dchiesa1

The dilemma you describe is an age old problem. When using a cache, there is a tradeoff between performance and accuracy. When applied in a distributed system, the accuracy challenge compounds, because there are multiple independent caches all of which maintain their own cached values.

You will have to evaluate how you want to balance that tradeoff. A cache expiry of 1 second will lead to a 1-second "maximum window of inaccuracy" for any given cache. The respective caches will still be inaccurate, but for no more than 1 second maximum, at a time. The downside of this is that the system will incur greater cost by fetching or computing values, because the cache might often be cold. Could you tolerate a wider window of inaccuracy? 10 seconds? 30 seconds? What happens when the token is stale? - could you have Apigee retry the upstream call, but refreshing its cache before doing so? Could you have the client retry?

In Apigee, there is no way to "join" the disparate caches across all the nodes and synchronize them.

My suggestion to you is to try different approaches and measure the behavior for correctness and performance.

Tony Hoare and Donald Knuth have been known to say, Premature Optimization is the root of all evil. By invoking this aphorism, I am NOT suggesting that the performance of various options is irrelevant. Rather, I am suggesting that you MEASURE the effects, rather than assuming you know what the relative performance impacts will be. You wrote, "Also, this creates lot of performance overhead." How much? Compared to what? Are you sure? Have you measured?

Some other things to consider:

You said the KVM gets updated every 5 minutes. Is it possible to design the system so that when you populate the KVM, you ALSO PUT to a cache? The KVM Get operation has an implicit cache, but, maybe you can wrap the Apigee Cache explicitly around the KVM. With the Apigee Cache you have more capability, you can invalidate an entry, or populate an entry explicitly, whenever you want. And you can Scope the cache to an API Proxy or to some wider scope.
Is it possible to reduce the frequency of the KVM update?
Is it possible to introduce some retry logic so that you TRY the token periodically and only refresh when the token is bad?
Is there some other way to work around the problem?

You might think, "by taking any of these steps, I'm just forcing the Apigee policies and services to behave in a way that is not harmonious with their design. I should just introduce another element, something like a Redis cache, that is designed for caching. That will solve my cache coherence problem more elegantly." But that's probably not valid. Introducing an external cache, like a Redis or GCP MemoryStore, WILL provide a cache. But you will have the same issue, the same tradeoff of performance vs accuracy. You wouldn't be solving anything, you'd just be shifting the problem to a different element in the system. You'd still have to measure, and experiment.

abhijithsh

Thanks for the reply @dchiesa1

I have not measured the performance impact. Our application is still under development and the same organization holds production proxies too. I don't have the user base to test the performance impact right now. Also if I automate the test, the production proxies might get affected.

My problem was specific. There is a token getting stored in Apigee KVM store. Which gets updated in 5 minutes. and the old token is invalid after this updated. That results in failure of all the requests which reads this value from caches which still hold the old value.

We did a workaround to keep the old token valid for longer time. This period is greater than the cache refresh time. By this we can make sure that the token in cache is valid even when it is not matching with the latest token in KVM store.

This was the easiest solution that I could come up with. Thanks for all the suggestions you gave.