Bandwidth-Delay Product

System administrators who wish to get maximum network performance across long, fast network connections may want to tune the TCP networking parameters on their systems. To know how to tune those parameters we need to compute something called the Bandwidth-Delay Product, or BDP.

Much to the chagrin of network engineers, us simple sysadmins can often think about simple (non-LACP) server network connections like water pipes (or sewer pipes if we are feisty). Larger-diameter pipes can move more liquid at once, similar to a network connection that has higher bandwidth. Longer pipes require more time to reach the other end, which correlates to latency. And while you might not think about it much, pipes have volume, too, which in this case correlates to the Bandwidth-Delay Product. If we want to keep the pipe full while we’re moving data we need to account for how much data can be in that pipe.

A Little More Complicated than a Pipe

There are multiple network protocols running on most modern systems, most commonly Transmission Control Protocol (TCP) and Internet Protocol (IP) among others like UDP, ICMP, and so on. IP is a connectionless protocol, meaning it doesn’t establish a persistent connection between the sender and receiver. It simply packages data into packets, addresses them, and sends them out onto the network. It’s the job of routers to direct these packets to their destination based on the IP address.

Most applications actually want the data to get to the other side, though, and that’s where TCP comes in. TCP is a connection-oriented protocol, so before any data is sent, TCP will establish a connection between the sender and receiver. This connection is used to ensure reliable delivery of data. TCP breaks the data into segments, sends them, and expects an acknowledgement (ACK) for each segment, saying that the data was properly received at the other end. If an ACK isn’t received within a certain time (the timeout), TCP assumes the segment was lost and retransmits it. The amount of data that can be “in flight” across a network connection before an ACK is needed is known as the TCP window size.

The Bandwidth-Delay Product (BDP) is a calculation that represents the maximum amount of data that can be “in flight” at any given time. If the BDP is larger than the TCP window size the system will not be able to use all of the bandwidth available to it. The sender will have to wait for ACKs before it can send more data, and that leads to pauses in the data flow. Uncool, but this is what you will tune using the BDP.

For example, my datacenter’s external network connection is 10 Gbps, and the site I want to talk to is 60 milliseconds (ms) away, round-trip. I have 10 Gbps networking all the way to the other site.

  1. Convert the bandwidth into bits per second (bps). For 10 Gbps that’s 10,000,000,000 bps (10 Gbps x 1000 Mbps/Gbps x 1000 Kbps/Mbps x 1000 bps/Kbps).
  2. Convert the RTT to seconds. For 60 ms that’s 0.06 seconds (60 ms x 1 second/1000 ms).
  3. Multiply the bandwidth in bps by the RTT in seconds. For me, I get 600,000,000 bits (10,000,000,000 bits/sec x 0.06 seconds).
  4. Convert it to bytes by dividing by 8. For me, that’s 75,000,000 bytes (600,000,000 bits x 1 bytes/8 bits).

In this example, my Bandwidth-Delay Product is 75 MB. So to use the whole network connection we need to have 75 MB “in flight” at any given time.

Perhaps you’ve got a site that’s using Starlink, and is getting 185 Mbps down into the site and 15 Mbps up out of the site, with a 50 millisecond maximum RTT.

  1. 185 Mbps is 185,000,000 bps, 15 Mbps is 15,000,000 bps.
  2. Maximum RTT is 50 ms, or 0.05 seconds.
  3. BDP into the site is 9,250,000 bits, or ~1.16 MB.
  4. BDP out of the site is 750,000 bits, or 93.75 KB.

It is likely these values will fall comfortably inside a default TCP window size on your operating system.

Bandwidth-Delay Product Inside a Fast Datacenter Network

Given that 100 Gbps switches are inexpensive let’s compute another example, that of systems talking to each other inside a high-speed datacenter network.Choose your source and destination well. To develop a worst-case RTT you may wish to have your ping traverse a router and a firewall so that you can tune for the latency they add to a connection.

  1. 100 Gbps is 100,000,000,000 bps.
  2. Maximum RTT is 1.96 ms, or 0.00196 seconds.
  3. BDP is 196,000,000 bits, or 24.5 MB.

The network can move a lot of data but it’s short so there’s not as much “in flight” as with the longer WAN link above.

Using Ping to Determine Round-Trip Time (RTT)

The “ping” tool is very helpful in determining round-trip time (RTT) for the bandwidth-delay product. You can simply ping the destination, and collect the data at the end:

7:51pm neuromancer/plankers [~] 1012$ ping -c 10
PING ( 56(84) bytes of data.
64 bytes from ( icmp_seq=1 ttl=64 time=0.223 ms
64 bytes from ( icmp_seq=2 ttl=64 time=0.350 ms
64 bytes from ( icmp_seq=3 ttl=64 time=0.312 ms
64 bytes from ( icmp_seq=4 ttl=64 time=0.187 ms
64 bytes from ( icmp_seq=5 ttl=64 time=0.243 ms
64 bytes from ( icmp_seq=6 ttl=64 time=0.385 ms
64 bytes from ( icmp_seq=7 ttl=64 time=0.278 ms
64 bytes from ( icmp_seq=8 ttl=64 time=0.263 ms
64 bytes from ( icmp_seq=9 ttl=64 time=0.260 ms
64 bytes from ( icmp_seq=10 ttl=64 time=0.359 ms

--- ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9172ms
rtt min/avg/max/mdev = 0.187/0.286/0.385/0.060 ms

I used the “-c” flag to tell it that I only wanted 10 packets sent. At the end you can see the rtt calculations. You have to decide if you want to use the average (avg) or the maximum (max) it recorded. The mean deviation (mdev) can help you figure out how variable the connection latency is. If mdev is high then the connection’s latency is unpredictable.

Mean deviation (mdev) is also sometimes called “jitter.”

Posted in “Performance Tuning” and “Networking” — you might be interested in other posts in those categories.