What is TCP?

TCP is the de-facto transport protocol on the Internet today and one of the core protocols of the Internet Protocol (IP) suite. It guarantees in-order, error-checked, delivery of all content sent from one network device to another. TCP employs retransmissions to ensure that no portion of the content is lost. To that end, TCP breaks content into packets. Each packet has a sequence number that identifies its relative ordering. The sender transmits packets to the receiver and expects acknowledgements for in-order, correctly received, packets. If any packet is detected as lost, it is retransmitted. TCP is responsible for re-arranging the received packets and delivering the content to the application without errors or data gaps.

TCP’s other primary functions include controlling congestion on the Internet and making sure that the receiver does not get overwhelmed with too much data. The former is referred to as congestion control, while the latter is referred to as flow control. Both require some form of adapting the transmission rate based on feedback from the receiver.

In general, network devices (e.g., routers, switches) or the links connecting them, may be overloaded, which causes packet loss. TCP attempts to probe the network for available bandwidth to ensure that the transmission rate does not exceed the available capacity. TCP takes this notion a step further by attempting to attain its fair share of the network’s resources. This is done by probing for additional bandwidth when no loss is detected while significantly reducing the transmission rate when packets are lost. By probing we mean sending data at a higher rate. Reducing the transmission rate following a loss allows the network to recover, and lets other competing senders obtain additional bandwidth. The result is that senders utilizing TCP reach a fair share of the available bandwidth on the network.

In terms of flow control, the mechanism is fairly simple. The receiver constantly communicates how much more data it can receive. If the receiver is busy processing previously received data, that is communicated to the sender which in turn reduces its transmission rate accordingly. Basically, TCP congestion and flow control mechanisms prevent both the network or the receiver from getting overloaded with too much data.

Why is TCP the protocol of choice at Akamai?

Given the widespread adoption of TCP on the Internet, the fact that it is implemented as part of the standard Linux kernel and most other OSes, and that it is allowed by firewalls network-wide, it is the protocol of choice at Akamai. It allows us to avoid overloading the network or receivers. TCP also has a wide variety of knobs that when tuned or optimized can further improve the performance of our delivery. All of that makes TCP a good choice for delivering content on the Akamai Intelligent Platform™.

Where does Akamai use TCP?

Akamai uses TCP on edge machines when communicating with clients/end-users. We refer to that connection leg as edge-to-client. We also use TCP between any two Akamai components. Some of those include, ghost-to-ghost, ghost-to-origin, ghost-to-netstorage, as well as all other types of midgress components that constitute the overall architecture of the Akamai Intelligent Platform, across all our products. Ghost is our Global HOST, that runs proprietary Akamai software, to provide traffic-serving functionality.

Why do we optimize TCP?

Optimizing TCP, by tuning the many available knobs, allows improvement of the overall performance of the protocol. We generally measure TCP performance using a few key metrics, namely: delivery speed (how many bits per second can be delivered) and overhead (retransmissions that account for duplicate data sent on our network).

How do we optimize TCP?

TCP is a complicated protocol. At a high-level, it operates in two modes: slow-start and congestion-avoidance. Those are different phases in the protocol that attempt to probe the network for available bandwidth using slightly different approaches. TCP maintains what’s referred to as a congestion window, which determines how many packets can be in-flight on the network at any point in time. The higher the congestion window, the greater TCP believes its fair share of the available bandwidth is. In slow-start, for every packet that is correctly received (i.e., acknowledged), the congestion window is expanded by a factor of 2; which is an aggressive rate of increase despite the “slow-start” misnomer. In congestion-avoidance, TCP believes it is much closer to its fair share and probes the network much less aggressively. Instead of expanding the congestion window by a factor of 2, the congestion window is only expanded by a single packet after an entire congestion window worth of packets is acknowledged by the receiver. In both cases, once loss is detected, the congestion window is shrunk and the probing starts again.

Akamai optimizes TCP by tuning knobs that control where we start probing from (i.e., the initial congestion window), how quickly we expand the congestion window in both the slow-start (factor of 2 or 3 or higher) and congestion-avoidance (increase by 1 or 2 or higher) phases, as well as how much we back off when a loss is detected (shrink window by 50%, 30% or even less). That allows us to control how aggressive the protocol is in acquiring bandwidth. A TCP instance that probes aggressively and does not back off as much will acquire a larger share of the available bandwidth, under most network conditions.

Are there other variants of TCP?

There has been a lot of research on TCP over the last 10–15 years, much of which has focused on improving some aspect of TCP’s behavior. The key finding is that TCP does not work well under all types of network characteristics, including loss/latency patterns, cross-traffic, how quickly the available bandwidth changes over time, and so on. In 2012 Akamai acquired FastSoft, a company that developed a novel transport solution that does not rely on detecting loss to adapt the congestion window. In general, TCP induces loss, by constantly probing for more available bandwidth, in order to estimate the correct transmission rate. It then reacts to the occurrence of loss. It’s a reactive protocol. FastTCP, the Akamaized version of FastSoft’s solution, attempts to estimate the correct transmission rate by utilizing latency estimates, among other things, without actually inducing loss. It’s a proactive protocol.

As network characteristics, traffic patterns, types of applications, and our product offerings change over time, Akamai will continue to push the envelope to design and implement novel transport solutions which improve delivery both to users and between the myriad of components on the Akamai Intelligent Platform.