Determining Maximum TPS of APIGEE Edge Private Cloud

I am new to the APIGee and this question might sound vague but is there a way to determine the Maximum TPS on and APIGEE Edge's 15-node deployment?

6 nodes with Message Processor and Routers

5 nodes with ZooKeeper and Cassandra

1 node with ZooKeeper, Cassandra and OpenLDAP

1 node with Qpid Server and Postgres Server

1 node with Qpid Server

1 node with Management Sever and Edge UI

 

Thank you

0 1 860
1 REPLY 1

is there a way to determine the Maximum TPS on and APIGEE Edge's 15-node deployment?

yes, the way to determine the maximum throughput is to test it.

You need to

  • design and implement an upstream ("backend") system for the API Proxies. This upstream system should exhibit a varying latency in response time, corresponding to the variation expected in a real world system. This system should accept transactions of "about the same size" in terms of payload and header numbers etc, as the expected production workload. The system should be distributed from the Apigee MP, on a network that is as similar to the network you expect to use for the production system as possible. (Many distributed systems are limited in their performance by the capacity of the network).
  • design and implement your API proxies, including all the capability you expect in that layer: token validation, logging, conditional routing, ServiceCallouts, and so on.
  • Design and implement a client that generates a mix of transactions that reflects the load you expect on your system
  • run tests with that client over "longer periods", like 30-60 minutes. The first 5-8 minutes of load should be treated as "warmup". Begin measuring after the warmup period.
  • Monitor the system closely during the performance test to ensure that there is stability in all of the components of the system. The CPUs, I/O, memory stress on all critical systems should fluctuate, but only within a narrow band. If you see a CPU or memory chart that shows uncontrolled variation, it's a sign of an unstable system and the performance results will not be a reliable indicator of real-world performance. You need to diagnose the source of the instability in that case.

There are so many variables involved in a distributed system - complexity of proxies, size of payload, think-time of upstream system, size of the VMs running the MP, capacity of the network, and so on - that it is impossible for anyone to give you a firm estimate on the expected performance capability of a particular Apigee instance. That may be not what you want to hear, but it's true.

a general guess is that a single MP node on a reasonable VM that is not network constrained "should be" able to carry 1000 tps. But that is a SWAG. Not a reliable, guaranteed figure.