When a user requests a media asset, especially large media assets, they may only be asking for a portion of a stream. The HTTP chunked media streaming formats, such as Akamai HD Flash, Adobe HDS, Apple HLS, or Microsoft Smooth Streaming have explicit support for requesting just the portion of a stream needed at the moment. Even more traditional HTTP downloads can be separated into chunks. Since it is likely that at some point in the future the user will be requesting the next set of bytes in the stream or the file, prefetching attempts to load that data into the edge server’s cache before it is requested by the user. When it is successful, prefetching reduces latency to the user as the next set of data will have already been transferred from the origin to the edge server by the time the user asks for it.

HTTP chunked prefetching

Prefetching for the HTTP chunked streaming formats is accomplished by inspection of the data being served to determine the URLs for the next several chunks in the stream. It then attempts to download a configurable number of chunks ahead of the user’s requested chunk.

For Video On Demand (VOD) streams, there will be plenty of chunks available to prefetch (except in the very last few seconds of the stream). Live streams, on the other hand, will typically only have one or two chunks which exist at the origin that have not yet been requested by a user somewhere on the network.

Traditional media prefetching

When prefetching objects for traditional media products, Akamai’s servers break up large objects (more than 10 MB) into smaller, roughly 2MB partial-objects. These partial-objects are cached just like true objects, so the servers download only the bytes that have been requested by the user and deliver them as they are needed.

Take for example users who begin a download at the office before packing up and heading home for the night. When they arrive at home, their laptop syncs up with their home WiFi and resumes the download of a large object. If the Akamai edge server that they connect to is different at home from the server they used at the office, the home edge may not have the full object in its cache. This is a cache miss, and typically, the home edge server would need to fetch the entire download of the large object from the origin. However, with chunking, their home edge server only fetches the missing portion of the file to complete the download.

Similarly, the edge server that the user connected to at the office only had to download a fraction of the file. The portion that it has downloaded will remain in its cache, but it won’t need to download the rest of the file since there have been no requests for those bytes.

All of this partial-object caching is transparent to the user and their browser or application. No special requests (such as an HTTP byte range request) are needed to enable partial-object caching in the Akamai Intelligent Platform™.

Partial-objects contain references linking them to the relevant master object. When the master object changes, the partial-objects cached throughout Akamai’s network are removed. New partial-objects are re-fetched for the new master driven by user requests.

Prefetching configuration

With VOD Streaming, Live Streaming, and Traditional Media Delivery, it is important not to prefetch too far ahead of the user.

For all types of delivery, it is important not to prefetch too much content because each added request adds load to both the Akamai network and to the origin. If those prefetched bytes are not consumed in a timely manner, then resources may have been wasted.

With Live Streaming, there are typically only one or two chunks even available for prefetch at the origin. It would not be good to flood the origin with requests for objects that don’t yet exist, as this would waste cycles on both Akamai servers and the origin servers.

In order to balance the benefits of prefetching against the cost of the performance hit, Akamai’s edge servers are configured to prefetch a limited number of objects ahead of any user request. In addition, the amount of prefetching is sensitive to how far ahead in the stream prefetching has gotten. For example, different prefetching levels may be configured for:

  • when the chunk the user needs right now is in transit from the origin
  • when the chunk is already in cache on the edge server

This approach favors conservative prefetching during times when prefetching could negatively impact the user’s experience — perhaps downloading only one chunk ahead of the user — while allowing more aggressive prefetching once we’re serving the user from cache. When serving content from the cache, Akamai edge servers may prefetch anywhere from a couple to perhaps a couple dozen chunks without impacting user experience.

To warm the cache, prefetching is done in a kind of burst mode. Once the edge server has prefetched the prescribed number of chunks ahead of the user, the prefetching algorithm reaches steady-state and will then only need to download a single chunk from time to time at whatever rate the user is consuming them — just enough to maintain that prescribed lead over the user.