Last updated on Saturday 12th of November 2022 08:16:19 PM
©VMWare ©ESXi SSH/SCP Throughput Limitations
Why is SSH/SCP so slow when transferring big files over a WAN(*)Thanks to Phill Thomas from Gecko IT for capturing packets and pointing to some clues and workarounds.
There's been much talk over the years in regards to the limited speed at which you can transfer data from within the ©ESXi shell.
As per our own experience working with the ©ESXi OS we've always said we never observed anything out of the expected. Per instance socket to socket throughput is what you would expect, as well as measured speed when tunnelling data through an OpenSSH tunnel within a LAN.
Throughput is a function of the TCP window size and the network latency
It's nonetheless quite obvious that transferring data over SCP or SSH over a WAN yields figures way below the expected. This has produced a significant amount of chatting and speculation in multiple forums, including the very ©VMWare Community forums and other well-known ones.
• Post at VMWare Community forums on SCP speed
• Post at Serverfault.com on eventual SCP throttle
The thing is that ©VMWare shares the code of their OpenSSH implementation, thus, if there was any speed throttle mechanism in place, somebody would have seen it so far (this issue is as old as ©ESXi is).
So, what the H is going on?, cause the limited speed is very real.
This issue is not the produce of one single decision taken by some mad computer engineer ganging up with ©VMWare's CEO. It's the result of the complexity of different factors playing together.
We'll break the problem up into its constituent parts, from the most obvious facts to the not so obvious ones.
SSH/SCP do encrypt data.
This point is somewhat obvious, still we need to cite it. The ciphers used to encrypt the data produce some overhead that plays a role in limiting the speed at which things can go. Still, this does not affect data transfers so much when working inside a LAN, thus the relative weight of this fact can't explain the issue on its own.
Data is indeed encapsulated by SCP/SSH which adds some additional data to transfer and reduces the payload transfer rate efficiency. This is also obvious but can't either explain on its own the observed effect.
The sum of these two first facts can't either explain what's going on, maybe some 10% of the observable effect when working with sufficiently powerful hardware (see The Effect of Low End Hardware).
When we take our magnifier and observe things in more detail, the issue becomes a conundrum. Fortunately the ©ESXi Hypervisor counts with a number of tools that allow us to analyze the traffic going through our NICs.
We can use tcpdump-uw to capture packet header information.
tcpdump-uw -i vmk0
Will produce the following output:
When analyzing things with this level of detail while performing different transfers over a LAN and also over a WAN, you begin to observe some predictable patterns:
When you transfer data over UDP protocol data is discarded when the TCP receiving buffer overflows. This will happen if data can't be written as fast as it is received as there is no acknowledgement.
On the contrary, when you send data over TCP, as SSH/SCP do (we want complete and acknowledgeable data), every time the receiving buffer fills up it will send an acknowledgement packet which will leave the TCP link idle during a time equal to the network latency. If the receiving buffer is small, the link will be idle most of the time.
When we send data over a LAN, the network latency is negligible when compared to the transferred data. In this case the buffer size (typically 64-128 Kb) is more than enough to keep up with it.
Nonetheless, when we transfer data over a WAN, the network latency can be many times higher, usually in the order of hundreds to thousands of times higher than within a LAN. In this case the receiving buffer size will start to play a critical role, as every time the receiving buffer empties, it will have to wait for the acknowledgement packet to arrive and the sending end to send new data and that will reduce the effective saturation of the TCP link.
Now the problem is that SSH uses a fixed receiving buffer size which is not very big. It is indeed more than enough to work decently in a LAN, but it's not when it comes to transfer data over a WAN. The bigger the latency, the slower the SCP transfer. In fact a relatively small latency: 20-40 ms can greatly affect the throughput of an SSH/SCP data transfer.
We can't know (from the captured data) what the size of this buffer is, still we can infer from the size of the TCP window field in the TCP headers, which show very low values. Even though nowadays all network equipment supports TCP Window scaling, a mechanism that allows increasing the Window Size field value over its 65535 byte limit, the TCP Window remains low, which can only indicate that the receiving buffer is indeed small.
So, what can we do to improve things?.
The most obvious solution is to try to increase the TCP window size in ©ESXi. You can do so and increase that value from 32Kb to 64Kb, that should help a lot. Still it may not be sufficient in narrower band WAN scenarios.
• ©VMWare: Change the Size of the LRO Buffer for VMkernel Adapters
Probably the best solution is to parallelize data transfer with Rsync, although this can only be achieved on a per file basis with regular tools. Thus if you only have one big disk to transfer, this method won't be helpful. If you have two or more -flat.vmdk files to transfer/migrate running parallel processes will help you to improve the global data transfer rate.
• Another option is to use some device in the middle Firewall Boost Workaround. This solution will not always work and you don't have much control on what's going on, as it will be the very routing OS that will decide how data will be buffered and whether this intermediate buffering improves the result. You can indeed monitor the TCP window and to tweak this value, but you can't modify the SCP/SSH/Rsync receiving buffer size as it's hardcoded.
Depending on your topology you may need to place the intermediate appliance in the sending ©ESXi host or in the receiving one. Per instance, if your target server is Linux, it will may be able to increase the receiving window to 1 or 2 MB. In this case you will want your intermediate device in your very source ©ESXi server, while if the target of your data transfer is some ©ESXi Hypervisor, you will want the intermediate appliance somewhere in the receiving network.
We offer free appliances that you can use to quickly deploy your workaround by setting some TCP port forwarding to the destination IP:port
• OpenWRT Firewall Appliance
Nonetheless any firewall appliance of your choice: PFSense, RouterOS, OpenWRT, etc... will very well do it.
There are other not so easy to accomplish options like modifying the source code of OpenSSH to allow bigger receive buffers or use some third party tool to patch SCP or SFTP and compile them on your own.
• Faster data transfer tools
The issues exposed here are common to any TCP/IP transmission suffering from a relatively high latency plus some small receiving buffer. To experience the issue TCP window scaling has to be turned off or, as in the case of using SSH to tunnel data, the inner OpenSSH receiving buffer might be the ultimate culprit.
There also exist other methods to work the problem around when posing the matter from a more general point of view. There are applications that can use UDP to send their data and then solve the data checksum issue at the application level, resending packets again over UDP when some application level acknowledgement does not match.
Nonetheless, in this case, the environment is fixed and we cannot redesign everything from scratch to tackle de problem.
The Effect of Low End HardwareThere exists low end cheap network hardware that we all know and use. This type of equipment is really nice to work with in small offices or at home for tasks that do not require powerful devices, like browsing the Internet, reading our e-mail or accessing a file server with office files.
When you use this type of hardware, for tasks that it just can't keep up with, like transferring huge virtual disk files, you are going to clog it, the network latency is going to spike and you are going to end up with ridiculous transfer speeds that will not copy your VMs in time. Apart from that, the chances that you loose some data are obviously much higher.
Many times it is this type of scenario that puzzles us until we realize that we are not going to make it with a 20.00USD 8 port network switch. It's not a matter of the number of ports they have, there are even 48 ports units which will work just fine at home or at the office for some lightweight tasks. You don't go to the war with a toy riffle.
• Pittsburgh Supercomputing Center: High performance SSH/SCP
• Serverfault: Windows TCP window scaling hitting plateau
• Accedian.com: TCP Receive Window
• IBM, TCP Window Size and latency
• Formula to calculate throughput
• TCP Protocol at Wikipedia