Solved: How much can i improve my TPS?

Report Inappropriate Content · 11-09-2017 01:40 PM

Hi,

I just started working with Apigee and i want to know how can i calculate how much infrastructure do i need to get to a certain amount of TPS. I know with the basic infrastructure that i found in the documentation using 1 planet with 2 regions i can assure at least 2k TPS.

Thanks!

DChiesa

Many factors affect the throughput of a system. It's difficult to give you a simple answer for how much infrastructure you need.

The size of the message payload affects maximum throughput. There is only so much network capacity on a single machine. Suppose your NIC provides 10gbit/s. This is a reasonably fat network. Converting to megabytes, that's 1250 MB /s. If each individual HTTP request carries 10mb and each response is 250kb, for a total of 12.5MB per request/response pair, then you will be able to carry 100 tps as a theoretical maximum. If you want 200 tps you will need another NIC rated at 10 gbit/s, dedicated capacity. This is theoretical maximum though; HTTP incurs some overhead on top of TCP. You can estimate that to be 15-25% of the total network capacity. therefore you will actually see 80tps for a single 10gbit NIC. Now suppose your request+response pair is just 1.25kb in total. Without considering the HTTP overhead, that's about 1 million requests/ second. But the HTTP overhead will be significant at that rate, maybe 30-40%, so figure 600k requests/second. But will the system really be able to serve all of that? No, because there are limitations in I/O and memory buffer management.
The capacity of the backend affects maximum throughput. Apigee Edge is used as a proxy. When a request flows into Apigee Edge, there's something else that Apigee Edge connects to. The maximum theoretical performance of the overall system is limited to the max of any one component. If the backend can deliver 175 tps, then your system *with* Apigee Edge will not deliver more than that (excepting caching).
The latency of the backend affects maximum throughput. If the backend can perform 175 tps, but the average latency is 30 seconds per request, then that means Apigee Edge must maintain context for 175 * 30 requests in memory at one time. That can involve lots of memory swapping and garbage collection, all of which can slow down a system. If the latency of the backend is an average of 1 second, then Apigee Edge needs to maintain 175 * 1 message contexts in memory, which puts much less stress on the system.
The Apigee Edge proxy itself does work. Sometimes that work is in-memory work, like an XMLToJSON or an XSLT policy. Sometimes it involves CPU-intensive work like computing or verifying message signatures. Sometimes the work performed by an API Proxy involves I/O, as when generating an OAuth token, verifying a token (less so, because this is cached), or refreshing an OAuth token. All of this can affect the load on the Apigee system and therefore its performance.
The number of nodes you have dedicated to handling API requests affects the performance of a system. You cannot assume that adding nodes will add throughput in a linear fashion. It's sub-optimal, expect around 60-70% depending on other characteristics of the system. This assumes an infinite network (lots of ports on your Cisco router! )

The only way to know how a system will perform is to test it. Ideally you will do this iteratively.

View solution in original post

DChiesa

Many factors affect the throughput of a system. It's difficult to give you a simple answer for how much infrastructure you need.

The size of the message payload affects maximum throughput. There is only so much network capacity on a single machine. Suppose your NIC provides 10gbit/s. This is a reasonably fat network. Converting to megabytes, that's 1250 MB /s. If each individual HTTP request carries 10mb and each response is 250kb, for a total of 12.5MB per request/response pair, then you will be able to carry 100 tps as a theoretical maximum. If you want 200 tps you will need another NIC rated at 10 gbit/s, dedicated capacity. This is theoretical maximum though; HTTP incurs some overhead on top of TCP. You can estimate that to be 15-25% of the total network capacity. therefore you will actually see 80tps for a single 10gbit NIC. Now suppose your request+response pair is just 1.25kb in total. Without considering the HTTP overhead, that's about 1 million requests/ second. But the HTTP overhead will be significant at that rate, maybe 30-40%, so figure 600k requests/second. But will the system really be able to serve all of that? No, because there are limitations in I/O and memory buffer management.
The capacity of the backend affects maximum throughput. Apigee Edge is used as a proxy. When a request flows into Apigee Edge, there's something else that Apigee Edge connects to. The maximum theoretical performance of the overall system is limited to the max of any one component. If the backend can deliver 175 tps, then your system *with* Apigee Edge will not deliver more than that (excepting caching).
The latency of the backend affects maximum throughput. If the backend can perform 175 tps, but the average latency is 30 seconds per request, then that means Apigee Edge must maintain context for 175 * 30 requests in memory at one time. That can involve lots of memory swapping and garbage collection, all of which can slow down a system. If the latency of the backend is an average of 1 second, then Apigee Edge needs to maintain 175 * 1 message contexts in memory, which puts much less stress on the system.
The Apigee Edge proxy itself does work. Sometimes that work is in-memory work, like an XMLToJSON or an XSLT policy. Sometimes it involves CPU-intensive work like computing or verifying message signatures. Sometimes the work performed by an API Proxy involves I/O, as when generating an OAuth token, verifying a token (less so, because this is cached), or refreshing an OAuth token. All of this can affect the load on the Apigee system and therefore its performance.
The number of nodes you have dedicated to handling API requests affects the performance of a system. You cannot assume that adding nodes will add throughput in a linear fashion. It's sub-optimal, expect around 60-70% depending on other characteristics of the system. This assumes an infinite network (lots of ports on your Cisco router! )

The only way to know how a system will perform is to test it. Ideally you will do this iteratively.