EMG load Test

ankurjain1 · 12-03-2018 09:54 PM

I am trying to do a load test to emg. I am sending 3K requests/sec. It is allowing maximum 900 req/sec. No failures but response time is increasing. I also set the “nodelay” property as true to avoid buffer. Cpu usage is also 60-70% throughout the test. Can explain what is happening behind this? How to stop buffering of requests?

@Dino @Anil Sagar @ Google

dchiesa1

I dunno, but benchmarking is hard. Here's what I suggest.

First, how's the network? 900 req/s might be all you can get if your network is saturated. It depends on the network interface you use, and the driver, and the size of each request and response. Suppose you have a 1gigabit NIC, and it's not virtual (not running on a VM). 1 gigabit is really "1 gigabits per second", which is a nominal value. That equates to 125 megabytes per second (8 bits per byte). Now if you are sending 1mb requests, with zero-size responses you should expect no more than 125 requests per second. BUT, that's not really possible because of HTTP overhead. if I send a http request, there are headers and TCP framing that also get sent. In my experience, HTTP API requests can reach up to around 70-75% of nominal network capacity. So that means instead of 125 megabytes per second, you should expect around 94 mb per second. And that is IN and OUT. Be sure to count requests and responses. If your request + response total about 106 kb, then 900 req/s is all you will possibly get through ANY software handling http requests. It's the full practical capacity of the NIC. For YOUR request + response, running through a 1gigabit NIC, compute the theoretical max throughput like this: (94 mb/s) * (1024 kb/mb) / (size of req+resp in KB) = theoretical throughput in requests per second. You might want to vary the request and response size to see the results.
If your NIC is virtual, then you have to figure out how much of the 1GB hardware capacity your VM is being allowed.
If your Nic is 10gigbit, then obviously everything is 10x more throughput.
This is all theoretical max. What happens in the backend? If you have a backend that is slow to respond, let's say it takes 15s to respond to a request, that means the EMG has to buffer (request rate) * 15 requests. If it's doing 900 req/s, that means EMG must buffer 900 * 15 = 13500 requests. If those requests each include 10k responses, thats a lot of memory being managed. All of that memory management costs CPU, and THAT could be driving your CPU to 60-70% or more. to fix this you need to make the back-end faster.
a CPU at 60-70% is not bad. When you conduct a benchmark , you'd like to see the resources that take stress, like the CPU and the NIC, to be at or as close to saturation as possible. As close as possible to 100% usage. In practice it's hard to do because not every workload can balance out that way. some workloads put more stress on the nic, some put more stress on the CPU. a CPU of 60-70% says to me you're driving it pretty well. BUT, the question is, is the CPU doing USEFUL work? If the NIC is saturated the CPU usage will go up, because the CPU spends cycles managing memory and buffers, while waiting for the NIC. The way to figure it out is to drive various loads and measure the CPU at each one. 100 req/s, 200 req/s, 300req/s, and so on. when you see the CPU ramp up non-linearly w/r/t the request volume, that's when you know the CPU is doing busywork, compensating for a saturated resource elsewhere.
TLS, HTTP keepalives will also affect CPU usage. Test various scenarios to figure out where the bottlenecks are.
Lots and lots of iteration! Keep turning dials and running tests, and the picture will become clearer.

Former Community Member

Can I please get some more information?

What is the size/capacity of the machine?
What tool was used to generate the requests?
Can we see a copy of the org-env-config.yaml file?

ankurjain1

Hi Srinandas,

Answers to your question:

1. I am using AWS Ec2 Instances of type t2.micro and t2.medium. I have increased the ulimit of the instances to 4k.

2. I am using "Locust" software to generate the load.

3.

Former Community Member

Can I please see the OAuth stanza? I want to make sure the token is cached. That will improve the performance.

t2.micro is rather small (1 vCPU and 1 GB RAM).

Lastly, how many clients/consumers are you starting? What does the output netstat say? I would like to know the number of TIME_WAIT and LISTEN TCP connections.