Black Friday is traditionally when many businesses have their accounting sheets go from red to black, hence the black of Black Friday. But for online businesses, it’s all about Cyber Monday. Last year, Cyber Monday became the biggest U.S. online shopping day ever, with close to 2.67 billion U.S. dollars in online spending. That number is projected to grow 10% this year, and there are already signs that this year will be bigger than ever. Alibaba’s “Singles’ Day” event reached $25 billion just a few days ago.
What these numbers mean is that you might get more traffic than you planned for. It’s a problem that business owners love to have—and one that operations teams hate. The increase in traffic can cause major performance problems, slowing your site down for everyone.
For online businesses, performance and reliability directly correlate to sales, and companies can expect a loss in profits in proportion to reduced site performance. The fact is that online commerce is an unforgiving arena where web performance troubles can be costly.
The answer is to design an infrastructure that can scale to meet demand (auto scaling, over-provisioning, etc.), and also have a solid Disaster Recovery (DR) plan if, well, there’s a disaster. But what if you could have an intermediate plan where you keep revenue flowing without the hassle of activating your DR plan?
The new way
I got this idea from a large retail store that has a billion-dollar online business. During times of beyond-peak traffic volume that would have normally crippled their site, they found a way to instantly reduce their site to a minimum viable product. With this approach, they kept the primary functionality that drives conversion and generates revenue, while disabling all other functionality.
How did they do it? In short, they cached just about everything. What they didn’t cache were the API calls that control price, inventory, and adding items to the cart. As such, the site could be down and customers wouldn’t know until they try to check out. To be clear, this is not a perfect solution to a site-crippling traffic spike, but it is far better than the site being completely down.
Let’s take a look at three pieces of this approach in a bit more detail.
1. Cache as much as possible
One of the first performance lessons I learned was to cache as much as possible. It’s a basic concept that Steve Souder made famous with the quote “The fastest HTTP request is the one not made”. What many developers fail to see is that the real benefit from caching is availability and scale.
For example: What if you strip down your site to only check for inventory and perform database queries when a user is adding items to the cart? That means you’re only bothering your infrastructure with ~10% of the traffic; the rest is cached.
The architectural approach is to make your sizable content (HTML, images, CSS, JS, etc.) static and move dynamic components to separate Ajax or ESI calls. This design enables your infrastructure to focus on dynamic, mostly revenue-oriented tasks rather than wasting resources on unchanging and immutable requests that could be perfectly served from a cache server.
It’s a concept that Single Page Apps (SPA) have taken to the extreme. On a SPA, there is only one HTML page that works as a wrapper and every interaction with the server is an API call. It’s a powerful performance tool as long as you remember that API calls can be cached as well. You should cache your “products API” call in the same way that you cached your product pages, which means you must retain separation of concerns on API calls to keep static and dynamic content on different HTTP operations.
Where am I going with this? Simple: cache everything you can and continually review what you’re not caching. When an attack or a traffic spike hits you, having 90% of the content cached on a CDN will allow you to scale in a more reliable, linear, and budget-conscious way to meet demand.
2. Optimize for conversion
Reducing functionality does not mean eliminating functionality. As developers, we are so accustomed to continuous improvement and implementation of new features that we sometimes lose track of the real business drivers behind our products. In other words, there are certain parts of our site that influence the conversion of customers more than others, and it makes sense to focus on keeping those up during critical situations.
To identify which are the most conversion-critical parts of the site, use the Conversion Impact Score. The Conversion Impact Score is part of the mPulse Performance Analysis document created by Akamai Professional Services for mPulse customers. The chart below is an example. The green dots answer the question, “How much impact does the performance of this page have on conversions?” while the blue columns show traffic on the page group.
Your search functionality is a great example of a conversion-critical action: how many hours have you spent designing and redesigning the functionality to improve the accuracy of the results? Maybe you even have a dedicated team behind it. The goal of the search is that your customers find what they’re looking for fast. You can, and should, cache search results for at least a short TTL; however, searches follow an 80/20 rule where ~80% of the searches will be for ~20% of the inventory (popular items) while the rest is spread across the rest of your catalog. This latter 20% of searches (i.e., for the less-popular items) will have poor offload, and often hit a cold cache that requires processing and database queries.
So if these searches are happening during extreme traffic on your site, you face a dilemma: How do you eliminate the origin hits generated by searches (because you need resources concentrating on conversion) without eliminating your customers’ ability to find what they’re looking for? Here are two options:
- Replace the search with a redirect engine that takes users to the product family or brand page. It won’t be an exact match on your customer’s search, but it will hopefully have what the customer is looking for a few scrolls down. You can’t spend time defining searches and creating regex matches while the whole site is down. You can, however, incorporate this strategy to your continuous availability/disaster recovery plans.
- Use a search engine such as Google or Bing to perform the search. The main reason why you don’t normally send clients to https://www.google.com/search?q=site%3A<your_site>+<search_term(s)> is the brand name loss and the risk of customers finding competitors ads. Would you rather take the search/site down than expose your business to these risks? That’s a question your organization must consider. If you do move forward with this approach, make sure your product pages are discoverable by bots so if your search is down you could leverage a navigator (Google, Bing, etc.) for people to find what they are looking for.
3. Simplify search
Given the number of redirects behind a search, Akamai Edge Redirector offers a great solution that can offload all the logic to the cloud and perform the redirects as close to the user as possible. Do remember to keep a temporary 302 redirect, as you will revert back to your optimized search later. The search will be its own Cloudlet policy, and you’ll create it ahead of time.
TIP: Use your web server log to identify the most popular searches and focus on creating a policy for those. Less-popular searches tend to be overly specific searches such as “Adidas Superstar white with black stripes women size 7 OMG I cannot stop writing in the search box”. In these unusual instances, you can generally get away with redirecting to one of the keywords in the search (e.g., Adidas) or the home page.
The key for switching your site search to this functionality when necessary is to have it ready and tested in production and use a feature toggle to enable it. In other words, Edge Redirector will be in your configuration under a match that will never evaluate to true. With the new fast metadata activation you could, for example, do a metadata push that only changes the match to always evaluate to true and be live in less than 15 minutes.
Search is just one example; you could follow a similar simplification pattern with other parts of your site. For example, your “about” page might not influence conversion, but offers 90%+ offload and can be hosted from a static, cloud storage like Akamai NetStorage.
Conclusion: three right-now optimizations for Cyber Monday
Feel free to consider the concepts above for the long run, but for now, here are three things you can do today to prepare for Cyber Monday:
- Identify the page groups that have more influence on conversion and don’t have a high cache/hit ratio. You’ll want to increase their offload to improve availability and scale ahead of the holidays.
- Review your offload reports on Akamai to identify the pages that generate the most hits and volume to your origin, and consider caching them or increase their TTLs.
- Evaluate the parts of your site that contribute the least to conversion and implement a fast and easy toggle to substitute them with a stripped down version of their functionality. It might not take a lot of developer effort, and will certainly make your business partners happy when you tell them about the record sales the site had with minimal impact to shoppers.
I hope this post gives you some ideas for a successful Cyber Monday, and I wish you a happy holiday!