Today we want to talk a bit about large content libraries and how they interact with the Akamai Intelligent Platform™, including things you can do to help and what Akamai can do for you.
What is a large library?
A large library is one with a lot of raw assets: be they video files, product images, or personal photos. This type of content challenges any CDN as this content can be difficult to cache. You may see more requests to your data center for content than you may want. If you have a global audience and data centers only in one or a few locations, you may also see decreased performance while Akamai waits for content to come back from your distant data center.
The perspective of what makes up a large library changes constantly: what was large in 2005 is not considered large today. Furthermore, as you’ll see below, there are factors that can make a library seem larger. In 2013 terms, Akamai recommends that you begin to consider treating your content library as large when it approaches the 750 GB to 1 TB range. A 20 TB library should definitely be given large library consideration. That being said, as with all rules, there are exceptions to this, which are hinted at below.
Expectations for caching and performance
There’s something really important to keep in mind: It doesn’t really matter how large your library is, but how well Akamai can cache your content. After all, that’s what impacts your bill and the user experience. Better caching leads to numerous benefits. The primary ones are reduced cost and better performance. Though the size of a library is obviously a large factor, there are other things that can drastically impact Akamai’s ability to cache your content well.
Though your library may be large, your content may be requested in a way that makes your bits easy to cache. For example, let’s imagine a modern news company. A number of videos on the front page and the related articles are requested numerous times, whereas archival footage from 20 years ago are almost never requested. It will be exceedingly difficult to cache the 20-year archival footage, while caching the objects from the front page is very easy. When you run the numbers, you find that the “long-tail requests” for the 20-year footage might only represent 3% of requests, while 97% of them are for front-page objects, equaling a very good hit rate from the Akamai Intelligent Platform.
This is what’s meant when talking about a popularity curve: it’s important to consider how your library is accessed. Just having really long-tail content and a large library doesn’t mean it’s the majority of your requested content.
When working with a customers to improve their caching rates, Akamai generates a popularity curve from logs. This often gives key insights into what options are available to the customer and us and what tradeoffs can be made.
Bit pressure vs. footprint
Another important concept is how many bits you serve vs. the size of the library you’re trying to serve regularly. For example, if you have six users (say you’re still in testing mode), your offload rates will look very poor, regardless of the size of your library. Therefore, it’s important to consider how many requests you have vs. how many unique things you’re trying to deliver. Conversely, a large library will get better caching rates if it simply has a lot of bit pressure behind it, keeping objects fresh in cache longer.
That said, for many popularity curves there’s no amount of bit pressure to keep objects in cache. There simply aren’t enough requests per object.
What you can do to help
There are some things that can be done to help the situation.
Domain sharding can help utilize more machines on the Akamai network in a consistent way, allowing your requests to centralize on particular machines and therefore increase the ability for that machine to server content from cache. There are lots of great guides online to explain the concepts of domain sharding.
The critical point here is that the sharding must be consistent. I’ll go through what this means.
Essentially, say you have a pool of domain names like this:
And a few objects like this:
top_banner.jpg my_video.mpg new.png
You can have an object-to-hostname mapping like this:
top_banner.jpg -> cdn01.foo.com my_video.mpg -> cdn02.foo.com new.png -> cdn01.foo.com
This is a good way to do domain sharding on the Akamai network. However, if you do that, you should never have any requests like the following:
top_banner.jpg -> cdn02.foo.com # Don't do this! You need to consistently always use cdn01 for this object!
Put another way, an object must consistently be sent to one (and only one) hostname. If you use different hostnames for a given object, you will likely incur more cache misses as you’ll be sending requests for that object to multiple machines.
Sometimes domain sharding doesn’t help much, but it almost never hurts. The only case where it might make your delivery worse is in an area where DNS lookups (for the extra hostnames) make up a large fraction of delivery time. For many libraries, this isn’t a big deal, and DNS caching in modern browsers can render this a near non-issue. Basically, if you can, you should shard your content as described above for best results.
Depending on your concern, there are different caching architectures that can be deployed depending on your unique needs. While they don’t always help, it may be worth reaching out to your Akamai representative to match your needs to the best caching architecture.