I've been trying to run load tests against a node.js
web app that I'm working on. I've hacked up my app to log (every 5 seconds) the number of requests per second that it handles and the amount of memory that it uses. I saw some strange numbers. I should also mention that I was using ab
to throw load at the server. Perhaps not the best load tester around, but it served my purpose for this particular test.
My app started out well, handling about 1500 requests per second, but then all of a sudden it would stop serving requests. With repeated runs, the point where it would stop was always around 16K connections... in fact, it always hovered around 16384, mostly staying below, but sometimes going above it a bit (eg: 16400).
Ephemeral Ports
To those of you familiar with TCP networking, you'll recognise that value as the IANA specified number of Ephemeral Ports (49152 to 65535).
I started to log the state of all TCP connections every second using the following shell script (split onto multiple lines for readability):
while true; do echo -n "$( date +%s ) "; netstat | \ awk '$4~/\.http-alt/ { conn[$6]++ } END { for(c in conn) { printf("%s: %d\t", c, conn[c]); } printf("\n"); } '; sleep 1; doneThis is the output it returned:
1315341300 TIME_WAIT: 209 ESTABLISHED: 100 1315341301 FIN_WAIT_1: 4 FIN_WAIT_2: 4 TIME_WAIT: 1892 ESTABLISHED: 92 1315341302 FIN_WAIT_1: 1 FIN_WAIT_2: 2 TIME_WAIT: 3725 ESTABLISHED: 97 1315341303 FIN_WAIT_2: 2 TIME_WAIT: 5426 ESTABLISHED: 97 1315341304 FIN_WAIT_1: 1 FIN_WAIT_2: 5 TIME_WAIT: 7017 ESTABLISHED: 94 1315341305 TIME_WAIT: 8722 ESTABLISHED: 100 1315341306 FIN_WAIT_1: 2 TIME_WAIT: 10459 ESTABLISHED: 98 1315341308 FIN_WAIT_1: 3 FIN_WAIT_2: 3 TIME_WAIT: 12246 ESTABLISHED: 94 1315341309 FIN_WAIT_1: 7 TIME_WAIT: 14031 ESTABLISHED: 93 1315341310 FIN_WAIT_1: 3 TIME_WAIT: 15937 ESTABLISHED: 97 1315341311 TIME_WAIT: 16363 1315341312 TIME_WAIT: 16363 1315341314 TIME_WAIT: 16363 1315341315 TIME_WAIT: 16363 1315341316 TIME_WAIT: 16363 1315341317 TIME_WAIT: 16363 1315341318 TIME_WAIT: 16363 1315341319 TIME_WAIT: 16363 1315341321 TIME_WAIT: 16363 1315341322 TIME_WAIT: 16363 1315341323 TIME_WAIT: 16363 1315341324 TIME_WAIT: 16363 1315341325 TIME_WAIT: 16363 1315341326 TIME_WAIT: 16363 1315341328 TIME_WAIT: 16363 1315341329 TIME_WAIT: 16363 1315341330 TIME_WAIT: 16363 1315341331 TIME_WAIT: 14321 1315341332 TIME_WAIT: 12641 1315341333 TIME_WAIT: 11024 1315341334 TIME_WAIT: 9621 1315341336 TIME_WAIT: 7516 1315341337 TIME_WAIT: 5920 1315341338 TIME_WAIT: 4227 1315341339 TIME_WAIT: 2693 1315341340 TIME_WAIT: 1108 1315341341 TIME_WAIT: 23There are a few things to note here.
- I ran
ab
with concurrency set to 100, so the number of connections in theESTABLISHED
, andFIN_WAIT_{1,2}
states should be about 100 for the most part - I was polling netstat once per second + the amount of time it took netstat to return, so you might notice missing seconds in between
- Requests were being handled at over 1000 per second, so many of the connections would have gone from the
SYN_SENT
state toTIME_WAIT
in less than 1 second.
It's pretty clear here that everything just stops once the number of connections reaches 16363, and my app's logs confirm that. ab
ends up timing out somewhere near the end of this report (which should probably be the subject of a different blog post).
Now if we look closely, we notice that it takes 30 seconds from the first result we see to when the number of connections in the TIME_WAIT
state start dropping below 16363
TIME_WAIT
A few points should be noted about the TIME_WAIT state.- Only the endpoint that sends the first
FIN
will end up in a TIME_WAIT state. I'll tell you later why this is important - A connection remains in the
TIME_WAIT
state for twice the connection's MSL (See RFC793)
So we now have a whole load of TCP connections in the TIME_WAIT state and the client (the load tester) has run out of ephemeral ports to keep running the test. This is independent of the load tester used, and how it handles its sockets (poll, select, multiple threads, multiple processes, whatever). This is a limit imposed by the Operating System in order to comply with the TCP spec, and in general make sure the internet doesn't break.
Now there are various ways to get around this limit. The simplest is to just use multiple boxes to run your performance test all simultaneously hitting your server. The more boxes you use, the more likely it is that your server will break down before your client hits the port limit. But what if you don't have access to so many boxes (in a world without EC2 for example), or it's too expensive, or you're not utilising the other resources (CPU/RAM/Network Interface) on your client boxes and would like to get a little more out of them?
You could program your load tester to use SO_REUSEADDR
, which would allow it to reuse sockets that are in the TIME_WAIT state, but that could backfire if there's still data on the network.
MSL
Now remember that TIME_WAIT is twice the MSL. RFC793 recommends setting the MSL to 2 minutes, which would leave TIME_WAIT at 4 minutes. Luckily no current implementation uses this value. Linux has its TIME_WAIT length hardcoded to 60 seconds (implying a 30 second MSL), while BSD has set to 15 seconds (sysctl net.inet.tcp.msl
) and tunable at run time.
To verify, I reran the test after setting the MSL to 5 seconds:
sudo sysctl -w net.inet.tcp.msl=5000
And these were the results:
1315344661 TIME_WAIT: 1451 ESTABLISHED: 100 1315344662 FIN_WAIT_2: 5 TIME_WAIT: 3272 ESTABLISHED: 95 1315344664 FIN_WAIT_1: 1 TIME_WAIT: 5010 ESTABLISHED: 99 1315344665 TIME_WAIT: 6574 ESTABLISHED: 100 1315344666 FIN_WAIT_1: 11 FIN_WAIT_2: 2 TIME_WAIT: 7908 ESTABLISHED: 87 1315344667 TIME_WAIT: 9689 ESTABLISHED: 100 1315344668 TIME_WAIT: 11155 ESTABLISHED: 100 1315344669 FIN_WAIT_1: 3 TIME_WAIT: 12522 ESTABLISHED: 97 1315344671 FIN_WAIT_2: 2 TIME_WAIT: 13655 ESTABLISHED: 98 1315344672 FIN_WAIT_1: 4 FIN_WAIT_2: 12 TIME_WAIT: 13847 ESTABLISHED: 84 1315344673 FIN_WAIT_1: 2 TIME_WAIT: 13218 ESTABLISHED: 98 1315344674 FIN_WAIT_1: 2 TIME_WAIT: 13723 ESTABLISHED: 97 1315344675 FIN_WAIT_1: 2 TIME_WAIT: 14441 ESTABLISHED: 98 1315344677 FIN_WAIT_1: 4 FIN_WAIT_2: 6 TIME_WAIT: 13946 ESTABLISHED: 90 1315344678 FIN_WAIT_1: 3 FIN_WAIT_2: 15 TIME_WAIT: 14670 ESTABLISHED: 82 1315344679 FIN_WAIT_1: 2 TIME_WAIT: 15164 ESTABLISHED: 98 1315344680 FIN_WAIT_1: 2 TIME_WAIT: 15062 ESTABLISHED: 98 1315344681 TIME_WAIT: 15822 ESTABLISHED: 100 1315344683 FIN_WAIT_1: 4 TIME_WAIT: 15855 ESTABLISHED: 82 1315344684 FIN_WAIT_1: 15 TIME_WAIT: 15506 ESTABLISHED: 84 1315344685 FIN_WAIT_2: 9 TIME_WAIT: 15928 ESTABLISHED: 91 1315344687 FIN_WAIT_1: 4 TIME_WAIT: 15356 ESTABLISHED: 96 1315344688 FIN_WAIT_1: 2 FIN_WAIT_2: 8 TIME_WAIT: 15490 ESTABLISHED: 90 1315344689 FIN_WAIT_1: 1 TIME_WAIT: 15449 ESTABLISHED: 99 1315344690 FIN_WAIT_1: 3 TIME_WAIT: 15801 ESTABLISHED: 97 1315344692 FIN_WAIT_1: 1 TIME_WAIT: 15882 ESTABLISHED: 70 1315344693 FIN_WAIT_1: 3 TIME_WAIT: 16106 ESTABLISHED: 56 1315344694 FIN_WAIT_1: 3 TIME_WAIT: 15637 ESTABLISHED: 2 1315344695 TIME_WAIT: 14166 1315344697 TIME_WAIT: 12588 1315344698 TIME_WAIT: 10454 1315344699 TIME_WAIT: 8917 1315344700 TIME_WAIT: 7441 1315344701 TIME_WAIT: 5735 1315344702 TIME_WAIT: 4119 1315344703 TIME_WAIT: 1815 1315344704 TIME_WAIT: 88
The test runs longer here because it actually gets to 50000 connections. We're doing about 1500 requests per second, so it would take just under 11 seconds to exhaust all 16384 ports, but since ports are reusable in 10 seconds, we would have reclaimed the first 3000 ports in this time and then just continue to use them.
Here's the catch though, you can only do this on BSD. Linux doesn't let you change the MSL. Also, DO NOT DO THIS IN PRODUCTION!
TIME_WAIT, FIN and HTTP keep-alive
Now I mentioned earlier that only the endpoint that sends the first FIN packet will enter the TIME_WAIT state. This is important because in production, you really do not want your server to end up with a lot of connections in the TIME_WAIT state. A malicious user could, for example, make a large number of HEAD requests to your server, and either wait for them to terminate, or specify Connection:close
but leave the connection open for your server to close by sending the first FIN packet.
How do you deal with this?
6 comments :
If I may ask, what the the file descriptor limit on this system?
Thx
I'd bumped it to 5000 before I'd started testing, but that wouldn't have been an issue here, since fds are only allocated for open connections (SYN_SENT/SYN_RECD/ESTABLISHED/FIN_WAIT{1,2}/CLOSING) and at any point of time there are only 100 connections in any of those states.
If you are setting up 100 parallel (v)users, why can't they reuse same ports so you would need only 100 ports? I use tools like roadrunner and like and never looked at this depth, so i might be missing something.
A socket is a tuple of five elements: protocol, server IP, server port, client IP, client port. This has to be unique for the both ends to tell the difference between different peers (note that TCP is a peer-to-peer protocol and there are no clients or servers). Now since the server (HTTP server) is passive, and receives connections, it needs to listen on a fixed, well known port. It can accept multiple clients on this port, but only one at a time. Similarly, the (HTTP) client can only make one connection from a single port. In order to make 100 simultaneous connections, it needs to use 100 separate ports.
The immediate question that comes up then, is how does the server handle 100 parallel connections on the same port. The answer is that it doesn't. It handles one connection at a time, and the remaining 99 sit in a listen queue waiting for the server to be done. When done, it moves to the next connection in the queue. By default this listen queue is set to 128, so any more connections will be dropped by the TCP layer.
This is fine if the server handles a client very quickly, but if some connections take more than a few milliseconds to be served, the server will also use ephemeral ports, and will transfer the connection to an ephemeral port leaving its well known port open to accept new connections.
Thanks Phil. Coincidentally I have two servers with very high TIME_WAIT state numbers (10K each).
How can you find the process associated to an ephemeral port using lsof or netstat when they do not stay at TIME_WAIT state?
Post a Comment