Blog

A Technical Deep Dive Into Purging by Cache Tag

March 28, 2019 · by Tim Vereecke ·

Hopefully, you’ve already read the excellent "Introducing the Ability to Purge by Cache Tag" blog post by my Akamai colleague Sid Phadkar. If not, I highly recommend that you check it out. Building on Sid’s introductory post, I’d like to now take you on a deeper dive into the new purge by cache tag functionality in Akamai’s Fast Purge.

But first, let’s take a general look at how this functionality works.

Purging by cache tag: the basics

To get started with purge by cache tag, you simply associate any object with one or more tags using the Edge-Cache-Tag response header.

As an example, let’s say you’re an online retailer and you’re selling some handsome socks on your website. You might associate the socks with various tags, such as:

  • Product: "red tennis socks" (product ID: 2127)
  • Brand: "Lorem Sportswear, Ltd." (brand ID: 102)
  • Category: "socks" (category ID: 76)  

Then, you would create one header with three cache tags:

Edge-Cache-Tag: product-2127, brand-102, category-76

Once you’ve associated the socks with those tags, you can then use these cache tags to specify a group of objects for purging, whether you want to purge all socks, all red tennis socks, or all Lorem Sportswear brand socks.

There are three methods you can use to purge by cache tag on the Akamai platform:

luna fast purge
Figure A: Purging by cache tag in the Luna Control Center

 

fast purge CLI
Figure B: Purging by cache tag via the command line in the Akamai CLI

Deep Dive

Now that you’ve got the basics down, let’s start the deep dive. Before we go into details ofpurge by cache tag, let’s start by understanding why we created this functionality. The reason was simple: in certain situations, the existing methods of purging (via URL or via CP code, both of which are successfully used many millions of times every day) had shown some limitations, and these situations were common enough that we decided we needed to provide our customers with an additional way to purge. Here’s an overview of the situational limitations that led us to create purge by cache tag:

The limitations of purging by URL

Purging via URL is a popular and powerful method, but generally speaking, it can have a couple of high-level restrictions:

  • You need to know/calculate all the URLs, which can be difficult or impractical.

    • To purge by URL, you must painstakingly catalog every URL you want to purge; that can include all image variants, all PDP variants, all indices of a paginated page, and many more parameters. Plus, if you miss a URL (e.g., an extra query parameter) then the content will not be purged, which can affect user experience, legal liability, and/or brand reputation.

  • In certain situations, it can be too slow.

    • Although purging by URL is generally super-fast, when a very high number of URLs must be purged, rate limiting can increase actual purge time.

The limitations of purging by CP code

Purging by CP code is great for technically driven changes, such as:

  • Purging all static assets (e.g., CSS/JS/fonts) after each deployment without having to list the individual resources
  • Purging all product pages because an updated template was deployed

However, you’re limited in flexibility when you want to extend purge by CP code to business changes. Those limitations include:

  • You can’t purge objects via different paths.

    • A key design principle for CP codes is that every object in the cache is assigned to one—and only one—CP code (as noted above, purge by cache tag allows you to associate multiple tags with any object). This means you can't use CP code purging to purge objects via different paths. For example, if you have one object with five different cache tags, you’ll have different paths/ways to identify that object; with CP codes, you have one and only one. So purging by cache tag gives you greater flexibility in this instance.

  • It can be difficult to get granular control.

    • The granularity of CP codes is limited due to the fact you need to program the rules for CP codes in Akamai’s Property Manager, and Property Manager is not an ideal tool for this type of granularity. CP code segmenting is a rather static segmentation; as such, it’s not suitable for the kind of highly granular dynamic segments often required by businesses.

Conquering those limitations: purging by cache tag

Now let’s look at how purging by cache tag can help conquer the limitations outlined in the situations above. We’ll use an example scenario to illustrate how purge by cache tag differs from other caching strategies:

Suppose you're an online retailer with 50,000 products from 400 different brands in 100 different categories (shoes, socks, etc.). Each product has a landing page including pricing and vendor logo. In addition to the landing page, let’s say that the product also has five subpages (e.g., reviews, product details, more info, etc.) and all of these pages are available in five different languages. This means for each product you actually have 30 URLs (5x5+5).

In this scenario, there are multiple moving parts affecting caching, such as:

  • Product info changes (e.g., pricing or product name)
  • Brand-level info changes (e.g., updated logo)
  • Category-level info changes (e.g., labels for SEO)

Let’s now examine five caching strategies you can choose from, with the fifth one being purge by cache tag, to show the specific advantages this new feature can give you compared to other alternatives (click to jump to any selection):

  1. No caching
  2. Short TTL (no auto-purging)
  3. Long TTL (no auto-purging)
  4. Long TTL, purging by URL via Fast Purge API
  5. Long TTL, purging by cache tag via Fast Purge API

1. No caching

With this strategy, you keep things simple, operationally efficient, and always up to date by not caching any objects at the Edge. This approach, of course, sacrifices offload (i.e., performance) because the full workload remains on your origin infrastructure rather than being distributed to edge servers.

no cache chart
Key: green = good, red = not good, orange = in between

2. Short TTL (no auto-purging)

Here we cache at the Edge with a short TTL and don't bother with purging, because the short TTL automatically limits the lifespan of the data. Compared to the “No caching” method above, this method provides better offload/performance for the most popular content (because it’s cached at the Edge), but sacrifices on being up to date because the short TTL automatically gets rid of data even if it’s the correct, up-to-date data. Less-popular content will often have cache misses because the short TTL may have discarded the data before it was requested. Operational efficiency is not negatively impacted because by the time an issue occurs, the cache is already clear due to the short TTL and no action will be required.

short TTL

3. Long TTL (no auto-purging)

With this strategy, the TTL is increased, which delivers higher offload/performance than the short TTL method because the data resides on edge servers for a longer period. However, the risk of being out of date increases for the same reason: the data sits on the server for a longer period before the TTL discards the data, so it’s more likely to become out of date. This, in turn, increases the chance that your team will have to intervene to manually update the data, which reduces operational efficiency.

long TTL

4. Long TTL, purging by URL via Fast Purge API

When you’re using a purge by URL strategy via the Fast Purge API (with a long TTL), there are different pros and cons. For example, if one of your products has a change (e.g., a new name for the product), you’ll need to create and call a new "purge product" function; this function will calculate all the URLs for this product (which could be 30 URLs per product as in the example noted earlier in this blog post) and then sends one purge request containing all the links using the API.

For an individual product, this approach works nicely and has no downsides. However, when designing a real-world solution—with, say, 50,000 products as in the example noted above—we need to take into account the rate limiting restrictions of the Fast Purge API:

  • 50 requests/second
  • 10,000 URLs/minute

Now, 10,000 URLs/minute may sound like a lot. But looking at our example with 30 pages per product, that means this approach’s actual capacity for purging is <333 products a minute (10,000 URLs/min divided by 30 URLs per product = 333 products per minute).

Here’s where that could become a challenge: suppose we update the logo (with a new filename and/or new dimensions) of a brand and need to purge all 700 product pages associated with that brand.

That’s 700 products x (30 pages per product) = 21,000 URLs. This is well above the limit of 10,000 URLs/min. Which means you will need multiple one-minute blocks before all your content is purged.

This causes (at least) two issues:

  • Reduced operational efficiency: In this scenario, content purging requires a great deal of forethought and preparation, and must include capacity planning, all of which takes time and puts an extra burden on your team members.
  • (Actual) purge time increase: The purge time itself might still be five seconds, but if you need to wait two minutes before you can send the purge request, the actual purge time is two minutes and five seconds.long TTL by URL

     

5. Long TTL, purging by cache tag via Fast Purge API

With this fifth and final caching strategy, we’ll now see some of the advantages of purging by cache tag versus other strategies.

Let’s begin. First, we modify our origin to include multiple cache tags part of the Edge-Cache-Tag response header.

Using our online-retailer example, we then assign each product page with three cache tags: one tag for the product, one for the brand, and one for the category, like this:

  • Edge-Cache-Tag: product-{{productId}}, brand-{{brandId}}, category-{{categoryId}}

Note: only one header should be sent, using comma-separated values to assign multiple tags.

How it works

A request to any of the 30 product pages for product "red tennis socks" (product ID: 2127) from brand "Lorem Sportswear" (Brand ID: 102) and linked to category "socks" (Category ID: 76) would contain one header with three cache tags:

  • Edge-Cache-Tag: product-2127, brand-102, category-76

Product "blue socks" (product ID: 5130) from “Lorem Sportswear” would contain one header with three cache tags:

  • Edge-Cache-Tag: product-5130, brand-102, category-76

Product "yellow socks" (product ID: 31921) from “Ipsum” (Brand ID: 306) would contain one header with three cache tags:

  • Edge-Cache-Tag: product-31921, brand-306, category-76

Now, for example, what happens when the brand with ID=102 has a change (e.g., a new logo is added or the brand name is changed)? We simply send a single API call to purge all content marked with cache tag "brand-102". There is no need for batching, no need to calculate URLs, and no risk that URLs are forgotten; in addition, we also maximize offload to the edge servers.

long TTL cache tag

Summary chart for all five caching strategies

The five different strategies discussed above all have advantages and disadvantages. Depending on your use case, your preference will vary. Here’s a look at all of them together:

summary chart

Q: Can I use more than three cache tags?

Yes. While the above example was limited to three cache tags, there’s nothing holding you back from including more cache tags for extra flexibility to do whatever fits your caching strategy. Here are two additional examples of cache tags you could potentially add:

  • Language: If a translation error is fixed on the French pages, you can purge all French content, while keeping the other languages in the cache.
  • Template: If a specific web page template is changed, you can purge all content linked to that template instead of doing a complete refresh.

Q: Where are cache tag headers set?

Cache tag headers are set by the origin. That means your PHP/JSP/ASP page sends the right headers based on the content which is being viewed.

Q: Can cache tags be used for static content?

Yes, purge by cache tag can be used for purging static content/objects (e.g., all images uploaded by a particular user, or all images linked to a particular product) at the Edge.

Q: How can I set cache tags using Akamai’s Property Manager?

It’s simple to set cache tags with Property Manager. You will follow these three steps:

  1. Create a new variable to store the calculated cache tag
  2. Calculate the cache tag based on path/filename
  3. Use the Modify Incoming Response Header behavior and a new Edge-Cache-Tag header with the variable you created in step 1

How it works

Let’s return to our example scenario where you are an online retailer with 50,000 products, and we’ll assume that each product picture has multiple variants. Here is the URL/filename pattern for images associated with product ID "122978":

  • https://www.demo.com/products/img/7/2/9/122978-10911-pristine.jpg
  • https://www.demo.com/products/img/7/2/9/122978-10911-540.jpg
  • https://www.demo.com/products/img/7/2/9/122978-10911-360.jpg
  • https://www.demo.com/products/img/7/2/9/122978-10911-t280.jpg
  • https://www.demo.com/products/img/7/2/9/122978-10911-t210.jpg
  • https://www.demo.com/products/img/7/2/9/122978-10911-t140.jpg

Each time this image changes, all the variants should be purged. To do that, we’ll assign a cache tag using this naming convention: "products-img-{{ID}}". Applying the convention to the above product, we end up with this cache tag: products-img-122978.

Once we have this cache tag, the screenshots below show a two-step rule in Property Manager to get the job done within the Akamai Luna Control Center interface:

  1. Set Variable: We first take the filename using the {{builtin.AK_FILENAME}} variable and use a regular expression (regex) to extract the first digits and store this in our custom PMUSER_CACHE_TAG variable.
  2. Modify Incoming Response Header: Next, we add a new Edge-Cache-Tag incoming response header in the “Custom Header Name” field.

two step process

 

If you manage your properties as code (via Property Manager API, Property Manager CLI or Akamai Pipeline) instead of using the Luna interface, here is the corresponding JSON snippet that you’ll need to get the job done:

 

{

   "name": "Set cache tag",

   "behaviors": [

       {

           "name": "setVariable",

           "options": {

               "variableName": "PMUSER_CACHE_TAG_OBJECT",

               "valueSource": "EXPRESSION","transform": "SUBSTITUTE",

               "variableValue": "{{builtin.AK_FILENAME}}",

               "regex": "^(\\d*)-.*",

               "replacement": "$1",

               "caseSensitive": true,"globalSubstitution": false

           }

       },{

           "name": "modifyIncomingResponseHeader",

           "options": {

               "action": "MODIFY","standardModifyHeaderName": "OTHER",

               "newHeaderValue": "products-img-{{user.PMUSER_CACHE_TAG_OBJECT}}",

               "avoidDuplicateHeaders": true,

               "customHeaderName": "Edge-Cache-Tag"

           }

       }

   ],

   "criteria": [

       {

           "name": "path",

           "options": {

               "matchOperator": "MATCHES_ONE_OF","values": ["/products/img/*"],"matchCaseSensitive": false

           }

       }

   ],

   "criteriaMustSatisfy": "all"

}

Summary

The Akamai team is excited about this powerful enhancement to the Akamai platform, and we hope you find purge by cache tag to be a useful addition to your overall caching strategy.

Finally, I encourage you to browse these other helpful resources:

Tim Vereecke is a web performance architect at Akamai Technologies.