Hello there,
Case: I have an external api endpoint from which I'm only allowed to receive data once per minute. The apigee endpoint is receiving quite a bit of traffic roughly 2.000 per minute.
Solution: Since Apigee doesn't support some kind of cache warming mechanism where it will get the data and store it in the cache by itself I have used javascript policy's and key value maps to allow only one of the calls to go trough to the back-end and fill the response cache. All other calls will always receive the data from the response cache. This works and the logs show that indeed only one of the calls goes through to the backend every minute. This policy is on the target endpoint.
I've also added an additional response cache policy on the proxy endpoint to cache different query parameter combinations so the policy's on the target endpoint don't have to run for every single call. This policy simply works on a 60 second timeout.
Problem: What we've been noticing is that when to platform is under heavy load the api endpoint occasionally returns old cached data. So for example it has been running fine and updating the cache data every minute at 18:00 and then suddenly returns data from 16:31 and will start running fine again a couple minutes later without intervention. Our logging shows that the external endpoint was hit every minute during the timespan and it kept returning fresh data. The in between updated data also suggests that the policy's are doing there job correctly with filling and invalidating the cache.
We're running an on premise install of Apigee Edge Version 4.16.05.00.
Hopefully one of you bright minds is able to let his or her light shine on this case. 🙂
Solved! Go to Solution.
We solved (evaded) the problem eventually. The problem was not related to the heavy load, but the size of the response which had to be cached (size and load always increased simultaneously due to the nature of the data).
Eventual solution from support:
Regarding caching issue - I've spent quite a lot of time looking into this and talking to Engineering team and I believe we may have found solution for it - basically at this point if the payload is bigger than 512kb message-processor may or may not cache it / properly propagate the cache across the cluster (inconsistent behaviour and this is a bug) however setting parameter "skipCacheIfElementSizeInKBExceeds" to higher value than 512kb seems to be fixing this problem (I tested this in my dev env with 2 message-processors).
Setting the property "skipCacheIfElementSizeInKBExceeds" to 2000 which forced the MP to cache payloads bigger than 2000kb in memory solved our issue.
@Bart Waardenburg , You have mentioned Javascript Policy & KVM in above issue. Any reason you are using them ? Why not out of the box response cache policy with 60 seconds cache expiry value ? Can you add more details related to Javascript Policy & KVM ?
@Anil Sagar Yes that's what I used at first, but the way the cache policy works is that it keeps the response in cache for 60 seconds. After those 60 seconds it will invalidate it and all subsequent calls will go through to the backend until one of them returns and repopulates the cache. Which resulted in doing more calls on the backend when the backend needed more time to process the calls. Which in turn resulted in longer times to process and thus again more calls and by that effectively making sure the backend wasn't able to respond to any calls.
With the Javascript and KVM policies I can make sure that when the first call goes through to the backend all subsequent calls are being served with the 'old' cache.
Could you provide code snippet related to Response Cache policies along with corresponding step configured as part of the endpoints. Provide detail of additional response policy as well.
Interesting problem.
You wrote
So for example it has been running fine and updating the cache data every minute at 18:00 and then suddenly returns data from 16:31 and will start running fine again a couple minutes later without intervention.
A couple questions:
We solved (evaded) the problem eventually. The problem was not related to the heavy load, but the size of the response which had to be cached (size and load always increased simultaneously due to the nature of the data).
Eventual solution from support:
Regarding caching issue - I've spent quite a lot of time looking into this and talking to Engineering team and I believe we may have found solution for it - basically at this point if the payload is bigger than 512kb message-processor may or may not cache it / properly propagate the cache across the cluster (inconsistent behaviour and this is a bug) however setting parameter "skipCacheIfElementSizeInKBExceeds" to higher value than 512kb seems to be fixing this problem (I tested this in my dev env with 2 message-processors).
Setting the property "skipCacheIfElementSizeInKBExceeds" to 2000 which forced the MP to cache payloads bigger than 2000kb in memory solved our issue.
User | Count |
---|---|
7 | |
2 | |
2 | |
1 | |
1 |