Boost Autocomplete Performance

May 15, 2020 · by Tim Vereecke ·

Suggesting relevant search terms while a user types in a search box is a common practice on many websites and search engines. It provides a relevant and fast response that is essential for an excellent user experience. 

But tuning autocomplete is not easy — search requests frequently change and typically include long-tail keywords.

This article describes some of the strategies used to accelerate autocomplete XMLHttpRequest (XHR) requests with Scalemates, the largest scale modeling website in the world (which I also run). In order to return results ten times faster than before, I implemented three techniques. I’ll walk you through each technique below. 

Example of autocomplete suggesting search terms

Relevant monthly statistics for Scalemates:

  • 2+ million search suggestions
  • Roughly 500K distinct search terms
  • Around 100K database edits impacting suggestions
  • Global audience

No store (default)

The first and default optimization technique starts with a controversial setting. Though serving a response from the cache is fastest, we mark our autocomplete requests as “no store” to achieve the best performance.

no store screenshot

The high cardinality of our search terms (500K distinct terms), in combination with relatively low global traffic (2+ million requests), and frequent updates (100K database edits) results in very low offload numbers for the majority of requests. 

If you already know that a term is most likely not in the cache, it’s faster to go directly to the origin. Then, you can accelerate requests using SureRoute, TCP optimizations, and the keep-alive timeout of 300 seconds. 

Cache + prefresh (for short search terms)

The second optimization only enables caching for short search terms to speed up delivery. The idea behind this approach is that many different terms share a common set of characters in the beginning.

Take these five distinct search terms:

  1. Red cars

  2. Redwood 

  3. Red trucks

  4. Redmond, Washington

  5. Red Sox

While users start typing, they all share the same initial lookups “R”, “Re” and “Red”. Short terms not only get a larger portion of the traffic but the variations are also limited. Some short search terms have a much bigger chance of hitting the cache, warranting a different caching strategy.

The definition of “short” depends on your traffic. For example, with limited traffic on, this approach only worked well for up to two characters.

The cache-hit-ratio reduces significantly with every extra character. Here is an estimation for the exponential growth in cardinality for each extra character in the term.

1 character » 46 (26 letters, 10 numbers and 10 common characters “-.():*’&%)

2 characters » 46 * 46 = 2.116 

3 characters » 46 * 46 * 46 = 97.656

4 characters » 46 * 46 * 46 * 46 = 4.477.456

Conditional caching based on the length of the search term is easy using Akamai Property Manager by following the three steps below.

  1. Create a variable called “PMUSER_AUTOCOMPLETEQ” with a default value of “0”:



  2. Extract the length of the query string parameter and store it in the variable:



  3. When the length of the search term is 0, 1, or 2, override the default “no store” settings — in this example, we store results up to 24 hours and (potentially) prefresh them after 20% of the time to live (TTL):opencache

EdgeWorkers (for popular terms)

The third optimization uses the power of EdgeWorkers to maximize the performance of the most popular search terms. Looking at analytics, we noticed that 500 out of the 500K different terms consume 20% of the 2 million monthly autocomplete requests. In order to accelerate these popular requests, we implemented a serverless function at the Edge, written in JavaScript. 

The EdgeWorkers bundle contains a JSON key-value store with the responses for the top 750 most popular terms. Using the EdgeWorkers CLI, the results for the most popular search terms are updated regularly via a scheduled task.

Setting up EdgeWorkers is straightforward — see for full details.

  1. EdgeWorkers is activated for the path matching the autocomplete service:


  2. The EdgeWorkers code takes the GET parameter term= and looks up the term in a local key value store: searchterms.js 

  3. If a match is found, a 200 status with the serialized JSON response is returned, and when there is no match, the request is forwarded to origin 


import URLSearchParams from 'url-search-params';  
import { default as searchterms } from './searchterms.js';

export function onClientRequest(request){
    const params = new URLSearchParams(request.query);
    const jsonContentType = {'Content-Type':['application/json;charset=utf-8']};
    const searchResult = searchterms[params.get('term').toLowerCase()];
        request.respondWith(200, jsonContentType, JSON.stringify(searchResult));    

The local list of terms available at the Edge on my example code uses JSON formats seen in jQuery UI or Awesomplete. Here is a sample line from that file (there is currently a limit of 1 MB for the uncompressed code bundle): 

"red":[{"label":"Red socks (103 results)","value":"cat876"},{"label":"Red shoes (203 results)","value":"cat124"},{"label":"Red shirts (34 results)","value":"cat89"}]

To access the code referenced on this blog post, see the sample code in GitHub:

Github » Akamai » edgeworkers-examples » fast-autocomplete

Monitoring best practices

Before you start tuning, it’s important to have granular visibility on the portion of the traffic you will optimize. Setting up a dedicated content provider (CP) code allows you to track trends in traffic, offload, performance, and error rates specifically for autocomplete requests.

offload graph

The graph above shows that tuning autocomplete is not a single step. The dedicated CP code helps validate and support our strategy.

Using the Akamai mPulse real user monitoring (RUM) tool, we enabled tracking of the autocomplete XHR requests. Below you see how we set up three variants for easy comparison. The top row matches short search terms, the middle row shows where EdgeWorkers is activated, and the last row is the default.


Good monitoring is key in the multi-step tuning process. It helps validate assumptions, adapting when needed to cut milliseconds from the response times.


mPulse RUM data: 23ms for hot terms (EdgeWorkers) vs 251ms for long tail terms (no store)


Autocomplete performance is critical for a good user experience. Using Akamai, you can combine a variety of simple techniques to maximize the performance. For Scalemates, it was worth it! We saw a 10x improvement for our most popular search terms.

About the author



Tim Vereecke is a Web Performance Architect at Akamai that loves speeding up websites to positively impact conversion, ad revenue, and SEO. On his free time, Tim runs, the largest online scale modeling platform in the world, providing him plenty of hands-on experience with both technical and business aspects of online performance including:

» Modern web frameworks and cloud deployments, APIs, RUM, DevOps, Security

» SEO, analytics, advertising, affiliate marketing, privacy, e-commerce, and conversion

» Tuning backend, frontend and CDNs for ultra-dynamic, long-tail, and personalized content