Edge Datastore Sizing and Architecture for a high volume gateway implementation

yazinthufael
Participant II

Hello,

I'm building a ha and scalable dr architecture for our gateway implementation.

While we're expecting tps to be anything between 1k and 10k, I would need some inputs from you, if you have come across any such requirement.

I wanted to clarify on the data source component. Assuming there is a bit of policy applied for all the api hits, is there a read/write happening on ds for every hit? Any rough estimate of the ds node capacity, on the volume it can handle?

What would be the recommended sizing for this requirement.

This way, I could be spot on with cassandra/zookeeper sizing. Appreciate your prompt response.

Thanks

Thufael

1 3 447
3 REPLIES 3

sarthak
Participant V

@Ahammed Abdulla Thufael

Lets start by answering the easy part: No, for every request/response it *may not communicate to the datastore. It depends on the use cases. If it is a passthrough then it does not need to communicate with DS. But if the APIs are protected by OAuth then for validating the access tokens it have to reach out to the datastore. If data is cached then also it may need very high Data Store read/writes. So the sizing really depends on the use cases.

I would recommend you to work with your Apigee technical contact to get further help on this.

Yes - you will need to measure. We can provide estimates but you will need to measure. Start with the 3-node Cassandra ring, using medium-sized nodes with appropriate network (probably 1gbps) in each datacenter.

The Edge gateway will cache the result of reads from the datastore, for 30 seconds or more. Let's suppose you use OAuth token approval in the API Proxy. If the proxy handles 1000 requests per second, but they all use the same token, then the datastore will mostly be quiet, unused. It needs to satisfy ~1 read every 30 seconds. (1 read per gateway node). If you have 1000 tps and there are 1000 different tokens, then there will be 1000 reads per 30 seconds. If you have 100,000 tokens and every request uses a different token, then the datastore will be much busier.

Using API key verification works the same way, except the keys do not expire as often as tokens (typically). There is caching in the gateway, so the datastore nodes tend to be lightly loaded if there is a smaller number of API keys in the mix.

What else are you doing with the datastore? Are you reading and writing KVM? Are you performing 3-way OAuth and reading + writing authorization codes? These will also affect the load on the datastore.

In my experience, the datastore is not the constraining factor. Instead, it's the network. the gateway nodes can saturate the network link to the backend. If you have a 10gbps network, Apigee Edge can fill all of it, if the backend exhibits no contention and has low latency. This could represent 1000 tps, or 8000 tps, depending on the size of each request and response. If you need 10000 requests per second, then scale up the number of gateway nodes to attain that, along with your desired margin-of-safety. For example, if you measure 1500 tps per node, and if your target is 10k tps, and if you want a 50% margin-of-safety, run 10 gateway nodes. This will give you 15k tps in aggregate.

Thanks @Dino that was really helpful.

To answer your query, yes we're doing kvm and target server reads. I assume that these are present on DS. I also assume that these are not frequently changing as the api keys and tokens. This should not affect Cassandra reads, hope these are cached.