Best Practices for Performance Testing in Production

July 7, 2015 · by Dan Boutin ·

Outages. Bad user experience. Negative branding. These are three of the many reasons why performance testing in production is not only a best practice — it’s a necessity.

Recently, there were several high-profile outages that lit up social media. United Airlines was the first domino. Then the New York Stock Exchange. And then The Wall Street Journal. And if that wasn’t enough, Amazon. Amazon! On Prime Day, no less! (Though one blog I read suggests that maybe Amazon had this event “planned” as a performance test in production.)

So, now seems like a good time to talk about how to avoid these issues.

The four most common performance testing categories

Before we get into best practices, let’s look at the most common bottlenecks that have been found over the years, as well as the types of testing that lead to these bottlenecks.

Over the past seven years, we’ve been involved in more than 10 million performance tests. Customers come to us to help solve problems with performance for various reasons (e.g. lack of expertise, testing at scale in production, time pressure, or simply the inability to find a bottleneck with their existing — and most likely dated — toolset).

These millions of tests generally fall into four main categories:

pie chart

New site launch and ongoing testing are pretty self-explanatory. Marketing programs typically include a promotion or a new product release or launch, while an event could be anything from holiday readiness to the Olympics to a breaking news story.

What are the most common problems in web performance testing today?

A few years ago, application code, database issues, configuration settings (my favorite: thread and connection pool settings), and load balancers were the top culprits.

So, without further ado, here are the most common findings in testing web and mobile using CloudTest, which we’ve experienced since the beginning of 2014:

pie chart

Some tests are designed to validate that fixes for previously uncovered bottlenecks have had the desired outcome. For that matter, not all tests are intended to find a specific stress point. So it is not surprising to see that the largest slice is “test goals reached before bottleneck found”. It is those validations tests that help performance engineers sleep at night, and it’s the other 70% of the tests where we find the issues that then lead to that peaceful night of sleep.

Application and web servers are clearly the top two contributors to poor performance — no surprise there. Often, configuration settings simply do not have enough infrastructure to support the intended load, or they have a poorly designed architecture.

That’s followed by the database, which may include issues around locking and contention, missing indexes, inefficient queries, memory management, connection management, or un-managed growth of the data. Or as with the application and web servers, it may simply be insufficient resources.

“Other” issues include a range of less commonly found bottlenecks, including issues with third-party services, content delivery networks (CDNs), shared environments, and firewalls. Some CDNs have tools to help monitor and optimize scripts, which can have a huge impact on performance.

Variables that QA lab testing doesn’t address

The following variables are unpredictable and therefore difficult to address during testing:

  • Batch jobs (log rotations, backups, etc.)
  • The impact of other online systems
  • Load balancer performance (often caused by misconfigured algorithm settings)
  • Bandwidth constraints
  • Latency between systems inside and outside of application bubbles
  • Network configuration problems
  • Data pipe configurations
  • Database sizes
  • Misconfigured application servers and web servers
  • CDN purge capabilities.

Now, let’s talk about best practices for testing in production and highlight things that you just CANNOT DO in a testing or QA lab environment.

Five best practices for performance tests, including CDN assets

When a retail company tests in production, it can also fully test the caching and loading capabilities of its content delivery network (CDN) provider. This is vital to understanding the true performance of a production environment.

In case anyone reading this needs a refresher, the primary purpose of a CDN is to reduce the number of times content has to be requested from the origin servers by delivering certain content from strategically placed servers within the content delivery network.

SOASTA has worked with Akamai to develop a set of best practices for tests including CDN assets. Here are some of the highlights, with some being more obvious than others:

  1. If you do not have a good handle on your real user traffic, test load generation should be evenly distributed across all available load server regions (depending on the nature of the test). This helps represent the real traffic distribution and to more accurately monitor performance from a variety of locations, making those measurements more statistically significant. However, if you are using a real user monitoring (RUM) solution, like mPulse, and you have a firm handle on where your users are originating from, then you can tailor your test load generation around various real user scenarios.  For example, say you are a top 100 retailer who is based in the Southeastern USA, have no brick-and-mortar stores, are located west of the Mississippi River, and your real user data shows that 90% of your eCommerce traffic comes from Southeastern USA. In this example, you would benefit from more accurate information from tailoring your distribution to Southeastern USA to provide optimal distribution of load during a test.
  2. Load testing should occur between 11 PM and 5 AM ET. (Assuming this is a North American customer.)
  3. User agent must include a string that identifies the load test vendor (ie. “SOASTA”).  This is so that Akamai can track the load traffic more efficiently and enable additional logging, which comes in handy when troubleshooting if issues are discovered during the load test.
  4. Notify Akamai or your preferred CDN vendor that you are scheduling a load test. This process may take some time if this is the first time that you are testing in production. The CDN provider typically needs some time to set up for the test. For example, Akamai has the ability to segment and log SOASTA traffic, not only for technical reasons, but also so that the customer does not get billed by Akamai for all the additional content delivered and for the potential for bursting. The set-up and notification process is just as important as the test itself. SOASTA has a deep set of best practices just for the “process” piece of a load test with Akamai/CDN.
  5. Load test ramp up should not occur faster than 0-full in 15 minutes, assuming you’re using a linear growth model.

The vast majority of performance test labs do not have a CDN as part of their infrastructure. You can only test the CDN performance impact by testing in production. Having CDN caching enabled greatly influences the testing in terms of the number of requests that reach origin and the response times depending on from where the page is served — CDN or origin.

Don’t forget your third-party service providers

Many e-commerce sites also use third-party providers to enhance their overall site content. Just as with your CDN provider, it is vital to involve those third-party providers that might have an impact on performance when the strategy is being formulated.

On the other hand, you would not normally include domains such as Google Analytics or Omniture metrics as part of the test. They do not want to be surprised by a test or have it bring down their service or site with fake transactions.

Involving third-party providers early, just like the CDN example above, helps ensure their support for your test. After all, CloudTest gives our customers the ability to analyze the performance of individual third-party provider content and provide that information to the third-party provider. Talk about WIN-WIN!

Takeaway: You need a solid testing process

A performance testing solution must have certain key features that will ensure your success, such as real-time analytics, a good “kill switch”, etc., but just as important is a good process for testing in production. It is imperative that your process includes involving your CDN provider and working with third-party providers to enable you and your team to be able to execute the most realistic performance tests as possible from a technical and environmental perspective, as you have gleaned from your real user measurement metrics.

For more on this and so many other fascinating topics, visit the Akamai Developer blog home page.