How to get the performance benchmark for specific apigee on-prem installation

Hello,

I have apigee 9 node architecture installed. And there are 2 routers and 2 message processor. I would like to do performance testing. So, I would like to know the standard performance benchmark for my apigee solution like maximum TPS or something like that.

Please help me how can I get this.

Thanks

Solved Solved
0 4 536
1 ACCEPTED SOLUTION

I would like to know the standard performance benchmark for my apigee solution like maximum TPS or something like that.

Performance benchmarking is a broad topic! 

There is no "standard benchmark" for Apigee. If you want to evaluate the performance of a system, any system, it's more or less a customized effort. Suppose you have a vehicle and you want to evaluate the performance of that vehicle. Exactly what aspect of performance will you evaluate? One possible metric to evaluate is the top speed of the vehicle. The way to test that is to press the accelerator to the floor and keep it there, and then wait. And the vehicle will gain speed, quickly at first, then more slowly, and after a while the vehicle will stop gaining more speed. It will have attained top speed. There's the answer. And then you run multiple trials, say 10-12 trials, and average the results.

But top speed is generally not a practical evaluation metric. How often do you drive your vehicle at top speed?  Typically you want to evaluate a vehicle on some more practical metric, like... maybe payload capacity.  How much weight can the vehicle carry up a hill at a minimum speed of 15mph? To measure that, you would design a totally different test.  You'd have to get a specific height/length and grade of a hill, and then a way to add payload in steps. And you'd iterate on those trials, and gather all the data. And then you'd have your answer. 

Or maybe you want to evaluate how many times the vehicle can stop and start, for a given fuel load.  That would be a different test.  Or maybe you want to evaluate some combination of those things. For each different set of evaluation criteria, you would design a specific benchmark test. 

In your API-based system, what do you want to measure?  "Maximum TPS" is not practical.  It's possible to have a max TPS of 1000 requests per second, with each request being fulfilled in 5 seconds. Is that acceptable? In most systems I have seen, a 5s response time is not acceptable.  So you need to establish FIRST a maximum acceptable latency, and then measure maximum TPS such that response latency does not exceed that limit

This is typical for distributed request/response systems: the performance metric of interest is maximum measured requests-per-second, with a given maximum latency response time. For example, the goal might be to measure max tps with a maximum 99th percentile (aka TP99) response time of 650ms.  If the system can deliver 1500 tps but the latency is 1000ms, that's not a valid result. Some people have no idea what the maximum TPS should be, or needs to be.  But most people CAN establish a maximum response time, based on desired responsiveness of the client or UI. So they just state the goal response time as 500ms, or 650ms, or 700ms, and conduct tests that way.  To realize this kind of test you need: 

  • a client that can drive load, like Gatling, or JMeter, or similar.
  • a configuration in the API proxy that reflects what your "real world" API proxies will do.  Authentication, caching, KVM lookups, JWT validation, etc.. This test proxy should be as close as possible to the actual production proxy. 
  • an "upstream" server that the API Proxy can connect to.  This upstream system need not be the "real" target system but it should have performance characteristics of a real system.  In other words it should have a configurable and randomized delay for responses. If the real world system responds to requests in an average of 120ms, with TP99 of 380ms, then the benchmark system should also have response times like that.  (Remember, overall latency is the latency incurred by data transmission time - client to Apigee and Apigee to upstream - as well as processing time - in Apigee, and in the upstream. The upstream latency is just one fraction of overall latency.)   There are mock servers out there that have "fixed delays" , of let's say 100ms for every request. But that's not realistic and the benchmark results you get if you use such a system will not reflect reality. You want a system that will give a delay that varies across the aggregate number of requests, according to  a gaussian curve, or a chi-squared curve, something that looks like reality. 

When you do the tests, the client should use a warm up period, maybe 5 minutes or so, during which the client is sending requests in, but you're not measuring results.  At the end of the warm up, begin measuring (counting requests and latency), and then let the client drive load for the period of the test,  maybe 20-60 minutes.  Your client requests should not be "all the same" - again they should vary, so as to reflect as accurately as possible the variety and distribution of requests from the real client pool.  Some smaller requests, some larger ones. A mix of GET and POST.  Some with authentication, some with invalid tokens, and so on. A variety that simulates what the real load would look like.  Run that mix of transactions, with multiple trials, at various levels of concurrency, and when the observed TP99 response time exceeds 650ms, then stop the test.  THAT is your observed max throughput for your chosen maximum TP99 response time. Run many trials, then average the throughput results across those trials.

If your results are not stable over multiple trials, in other words if your results vary by 20% from trial-to-trial, or even 5%, that is a sign that something is not configured properly in the benchmark setup. If throughput drops steadily over the duration of the test, again that is a sign of instability.  These results are not reliable. In other words you cannot be assured that you can achieve the results in a real world systems.  Only results that are stable within the trial and over multiple trials will be reliable predictors of real-world performance. If you have unstable results, then you need to find the source of that instability and mitigate it or remove it. Your goal is to have stable, repeatable test results. Sometimes a problem on the upstream system can lead to unstable test results for the overall system.

Performance testing is not a simple effort.  If you haven't done it before, I'd suggest you study up: get some books, even older books are good because the principles haven't changed. Or, obtain some outside expertise to assist. There are lots of articles on the web that talk about general performance testing.  Those also will be helpful for you.  Expect it to be a journey of learning for you and the team. 

 

View solution in original post

4 REPLIES 4

I would like to know the standard performance benchmark for my apigee solution like maximum TPS or something like that.

Performance benchmarking is a broad topic! 

There is no "standard benchmark" for Apigee. If you want to evaluate the performance of a system, any system, it's more or less a customized effort. Suppose you have a vehicle and you want to evaluate the performance of that vehicle. Exactly what aspect of performance will you evaluate? One possible metric to evaluate is the top speed of the vehicle. The way to test that is to press the accelerator to the floor and keep it there, and then wait. And the vehicle will gain speed, quickly at first, then more slowly, and after a while the vehicle will stop gaining more speed. It will have attained top speed. There's the answer. And then you run multiple trials, say 10-12 trials, and average the results.

But top speed is generally not a practical evaluation metric. How often do you drive your vehicle at top speed?  Typically you want to evaluate a vehicle on some more practical metric, like... maybe payload capacity.  How much weight can the vehicle carry up a hill at a minimum speed of 15mph? To measure that, you would design a totally different test.  You'd have to get a specific height/length and grade of a hill, and then a way to add payload in steps. And you'd iterate on those trials, and gather all the data. And then you'd have your answer. 

Or maybe you want to evaluate how many times the vehicle can stop and start, for a given fuel load.  That would be a different test.  Or maybe you want to evaluate some combination of those things. For each different set of evaluation criteria, you would design a specific benchmark test. 

In your API-based system, what do you want to measure?  "Maximum TPS" is not practical.  It's possible to have a max TPS of 1000 requests per second, with each request being fulfilled in 5 seconds. Is that acceptable? In most systems I have seen, a 5s response time is not acceptable.  So you need to establish FIRST a maximum acceptable latency, and then measure maximum TPS such that response latency does not exceed that limit

This is typical for distributed request/response systems: the performance metric of interest is maximum measured requests-per-second, with a given maximum latency response time. For example, the goal might be to measure max tps with a maximum 99th percentile (aka TP99) response time of 650ms.  If the system can deliver 1500 tps but the latency is 1000ms, that's not a valid result. Some people have no idea what the maximum TPS should be, or needs to be.  But most people CAN establish a maximum response time, based on desired responsiveness of the client or UI. So they just state the goal response time as 500ms, or 650ms, or 700ms, and conduct tests that way.  To realize this kind of test you need: 

  • a client that can drive load, like Gatling, or JMeter, or similar.
  • a configuration in the API proxy that reflects what your "real world" API proxies will do.  Authentication, caching, KVM lookups, JWT validation, etc.. This test proxy should be as close as possible to the actual production proxy. 
  • an "upstream" server that the API Proxy can connect to.  This upstream system need not be the "real" target system but it should have performance characteristics of a real system.  In other words it should have a configurable and randomized delay for responses. If the real world system responds to requests in an average of 120ms, with TP99 of 380ms, then the benchmark system should also have response times like that.  (Remember, overall latency is the latency incurred by data transmission time - client to Apigee and Apigee to upstream - as well as processing time - in Apigee, and in the upstream. The upstream latency is just one fraction of overall latency.)   There are mock servers out there that have "fixed delays" , of let's say 100ms for every request. But that's not realistic and the benchmark results you get if you use such a system will not reflect reality. You want a system that will give a delay that varies across the aggregate number of requests, according to  a gaussian curve, or a chi-squared curve, something that looks like reality. 

When you do the tests, the client should use a warm up period, maybe 5 minutes or so, during which the client is sending requests in, but you're not measuring results.  At the end of the warm up, begin measuring (counting requests and latency), and then let the client drive load for the period of the test,  maybe 20-60 minutes.  Your client requests should not be "all the same" - again they should vary, so as to reflect as accurately as possible the variety and distribution of requests from the real client pool.  Some smaller requests, some larger ones. A mix of GET and POST.  Some with authentication, some with invalid tokens, and so on. A variety that simulates what the real load would look like.  Run that mix of transactions, with multiple trials, at various levels of concurrency, and when the observed TP99 response time exceeds 650ms, then stop the test.  THAT is your observed max throughput for your chosen maximum TP99 response time. Run many trials, then average the throughput results across those trials.

If your results are not stable over multiple trials, in other words if your results vary by 20% from trial-to-trial, or even 5%, that is a sign that something is not configured properly in the benchmark setup. If throughput drops steadily over the duration of the test, again that is a sign of instability.  These results are not reliable. In other words you cannot be assured that you can achieve the results in a real world systems.  Only results that are stable within the trial and over multiple trials will be reliable predictors of real-world performance. If you have unstable results, then you need to find the source of that instability and mitigate it or remove it. Your goal is to have stable, repeatable test results. Sometimes a problem on the upstream system can lead to unstable test results for the overall system.

Performance testing is not a simple effort.  If you haven't done it before, I'd suggest you study up: get some books, even older books are good because the principles haven't changed. Or, obtain some outside expertise to assist. There are lots of articles on the web that talk about general performance testing.  Those also will be helpful for you.  Expect it to be a journey of learning for you and the team. 

 

Hello @dchiesa1 , after I have read your answer, I realized that I have little knowledge about performance testing. Thank you very much for your detailed and well-explained answer. 

And I have one more question. So, then, the goal of performance testing is to get the performance capability of the given apigee solution. I mean to create a performance benchmark of the apigee solution.

I am new to performance testing. I am sorry if the question is too basic.

Thanks

Yes, that is usually the goal of performance testing.  In general there are two sub-cases:

  1. People have an idea of the required performance of a system, and want to gain some idea of which system configuration (number of routers, MPs, etc) will meet the performance objective with an appropriate margin of safety.   For example, I might have a good estimate that the total API load driven by clients will be 1750 TPS at peak. Suppose I can measure that the max performance of the 9-node system is 2000 tps, with my given max TP99 latency.  I might decide that I need more nodes in the system in order to allow a larger margin of safety, to allow load spikes or the ability to "lose" a node (for example for maintenance) without incurring a service disruption.  So based on this I might configure the system to use 11 nodes, rather than 9.  That system might give 3000 tps, but it's the right margin of safety for me. 

    OR

  2. They have a system and want to estimate the peak performance of the system, so that they can plan for future upgrades should API volume and load increase at some later point in time. For example, I have a 9-node system and a new team within my company wants to host their APIs and this will bring 120TPS into the system. Do I have confidence that the 9-node system with the existing API volume, can also accommodate the new volume?  With a clear idea of the capacity of the existing system, you can make that decision confidently. 

@dchiesa1 I get your point, sir. Thank you very much.