Apigee Cache explained: Eventually Consistent

Background:

When creating proxies that utilize retrieving payloads from backend targets where the response does not frequently change, it can be very beneficial to cache the not-often-changing response in Apigee. This provides the advantage of avoiding unnecessary backend calls which can increase proxy and backend performance. However, proxy designers are not restricted to this use-case and it is possible to use caching in ways which may produce unintended consequences. This post will examine and provide recommendations about implementations of the cache.

Prerequisite knowledge:

This post focuses on the cache and will reference the cache-internals guide.

Consistency:

The cache system is eventually consistent in Apigee. This means that using the cache as if it were strictly consistent can lead to unexpected outcomes.

Antipattern:

It is an antipattern to design the logic of a proxy to use the cache as if it were strictly consistent. Take a look at this example proxy that demonstrates how relying on this could cause unforeseen issues.

We will first create a few policies:

Policies:

  • First use Assign Message to create the variable we want to cache. In this example it will simply write the client received timestamp to the variable flowVar_token
<DisplayName>Assign Message-1</DisplayName>
<AssignVariable>
	<Name>flowVar_token</Name>
	<Value/>
	<Template>time: {client.received.start.timestamp}</Template>
</AssignVariable>
  • LookupCache is used to read the cached value specified by the key ‘token’ and assign it to the flowVar_token
<DisplayName>Lookup Cache-1</DisplayName> 
<Properties/>
<CacheResource>my_cache</CacheResource> 
<Scope>Exclusive</Scope> 
<AssignTo>flowVar_token</AssignTo> 
<CacheKey> 
	<KeyFragment>token</KeyFragment> 
</CacheKey>
  • PopulateCache is used to write the value from the variable flowVar_token to the cache specified by the key ‘token’
<DisplayName>Populate Cache-1</DisplayName> 
<Properties/> 
<CacheResource>my_cache</CacheResource> 
<Scope>Exclusive</Scope> 
<CacheKey> 
	<KeyFragment>token</KeyFragment> 
</CacheKey> 
<ExpirySettings> 
	<TimeoutInSec>60</TimeoutInSec> 
</ExpirySettings>
<Source>flowVar_token</Source>

Flow:

Using these policies the request flow is constructed as below. The request.uri conditional allows us to forcefully write a new value to cache by adding write to the request path in an http call.

<Request> 
	<Step> 
		<Name>Lookup-Cache-1</Name> 
	</Step> 
	<Step> 
		<Condition>(lookupcache.Lookup-Cache-1.cachehit == false)</Condition> 
		<Name>Assign-Message-1</Name> 
	</Step> 
	<Step> 
		<Condition>(lookupcache.Lookup-Cache-1.cachehit == false)</Condition> 
		<Name>Populate-Cache-1</Name> 
	</Step> 
	<Step> 
		<Condition>(request.uri == "/proxy/write")</Condition> 
		<Name>Assign-Message-1</Name> 
	</Step> 
	<Step> 
		<Condition>(request.uri == "/proxy/write")</Condition> 
		<Name>Populate-Cache-1</Name> 
	</Step> 
</Request>

Load Test:

The next step in the demonstration is to load test the proxy to simulate traffic; specifically, we must create a scenario where a read/lookup request executes after a write/populate but before the value is propagated to other Message Processors (MP)’s L1 cache, or even to the L2 cassandra layer of the cache.

Since each MP has its own local L1 cache, if our org/env is deployed with at least (2) MPs then we will be able to see the eventual consistency. This makes it so that a read/lookup request may be served by a different MP instead of the previous MP that wrote/populated the cache.

The test setup follows this guide that utilizes the parallel and apache bench libraries.

$ (echo "ab -n 10 -c 10 http://my_url.com/proxy"; echo "ab -n 4 -c 4 http://my_url.com/proxy/write") | parallel '{}'

This command sends 4 concurrent write requests along with 10 concurrent reads.

Results:

Below is an excerpt from the trace captured during this concurrent load test. These requests are ordered by the execution timestamp of the specific policies. It is clear that the cache is eventually consistent because of the following:

  • While the first read request occurs at the same time as the second write request (07-06-21 16:50:12:167), it retrieves the previous value (value="time: 1623084611997") written in write request 1 instead of the value written in write request 2 (value="time: 1623084612166")
  • Read request 2 occurs (07-06-21 16:50:12:168) after write request 2 (07-06-21 16:50:12:167) but serves the “stale” value previously written in write request 1 (value="time: 1623084611997").
<!-- write request 1 -->
<Timestamp>07-06-21 16:50:11:998</Timestamp>
<VariableAccess>
           <Get name="target"/>
           <Get value="time: 1623084611997" name="flowVar_token"/>
           <Set success="true" value="188105" name="apigee.metrics.policy.Populate-Cache-1.timeTaken"/>
       </VariableAccess>
 
<!-- write request 2 -->
<Timestamp>07-06-21 16:50:12:167</Timestamp>
<VariableAccess>
           <Get name="target"/>
           <Get value="time: 1623084612166" name="flowVar_token"/>
           <Set success="true" value="212978" name="apigee.metrics.policy.Populate-Cache-1.timeTaken"/>
       </VariableAccess>
 
<!-- read request 1 -->
<Timestamp>07-06-21 16:50:12:167</Timestamp>
<VariableAccess>
           <Get name="lookupcache.Lookup-Cache-1"/>
           <Get name="target"/>
           <Set success="true" value="time: 1623084611997" name="flowVar_token"/>
           <Set success="true" value="93976" name="apigee.metrics.policy.Lookup-Cache-1.timeTaken"/>
           <Get value="true" name="lookupcache.Lookup-Cache-1.cachehit"/>
       </VariableAccess>
 
<!-- read request 2 -->
<Timestamp>07-06-21 16:50:12:168</Timestamp>
<VariableAccess>
           <Get name="lookupcache.Lookup-Cache-1"/>
           <Get name="target"/>
           <Set success="true" value="time: 1623084611997" name="flowVar_token"/>
           <Set success="true" value="77905" name="apigee.metrics.policy.Lookup-Cache-1.timeTaken"/>
           <Get value="true" name="lookupcache.Lookup-Cache-1.cachehit"/>
       </VariableAccess>
 
<!-- read request 3 -->
<Timestamp>07-06-21 16:50:12:173</Timestamp>
<VariableAccess>
           <Get name="lookupcache.Lookup-Cache-1"/>
           <Get name="target"/>
           <Set success="true" value="79558" name="apigee.metrics.policy.Lookup-Cache-1.timeTaken"/>
           <Set success="true" value="time: 1623084612166" name="flowVar_token"/>
           <Get value="true" name="lookupcache.Lookup-Cache-1.cachehit"/>
       </VariableAccess>

As can be seen with this series of requests, if the proxy was designed so that it depended on read values being the most recently written values, read request 2 would have an unforeseen consequence because it serves a “stale” value despite occurring after the new value is written.

Conclusion:

The cache system used by Apigee is eventually consistent - there is no promise on how long it will take for the values to synchronize across the L1 and L2 layers. This means that designing proxies which use logic that depends on strict consistency for reading/writing to the cache may experience unforeseen issues. The major takeaway here is that proxies that utilize the cache should be designed with this eventual consistency to avoid such an antipattern.

Lastly, there are additional factors that influence the synchronization timing such as the number of MPs serving the org/environment, and the size of the cached value. From this consideration, scaling the infrastructure vertically and/or horizontally to mitigate such occurrences may have little to no effect, or even worsen the issue.

Version history
Last update:
‎06-08-2021 02:44 PM
Updated by: