Blog

Overview: GraphQL Query Parsing and Caching at the Edge

October 29, 2018 · by Monika Rathor ·

GraphQL, the Facebook-incubated data query language for APIs, is a fast-growing alternative to REST. The goal of GraphQL is to reduce payload size and give an API client the ability to query for only for the data it needs. GraphQL uses a flexible syntax based on type definitions, describe the exact fields needed. This avoids the problems common to RESTful APIs such as over-fetching and under-fetching of data, versioning, and an ever-increasing number of API endpoints that a client needs to interact with. GraphQL tends to be resource-intensive, hence caching of GraphQL responses to offload origin and improve GraphQL client performance is important.

In this blog post, I review how Akamai edge servers can help improve performance and security for GraphQL server implementations.

Query Caching Challenges

In a typical endpoint-based RESTful API, edge servers can use HTTP caching to avoid refetching resources, as it’s easy to identify when two resources are identical. The URL in a RESTful API acts as a unique identifier that can be used to build a cache key. However, in GraphQL, there is no URL-like primitive that provides a globally unique identifier for a given object.

To be able to cache graph-style responses at the infrastructure level, the GraphQL query itself must be part of the cache key. Let's take a look at a couple of examples from a sample blogging API to make this more concrete:

  • REST API cache key example

Imagine a REST API that returns blog post information based on id, such as http://api.example.com/api/blog/{id}. Below are two sample URLs from this API and the unique cache keys based on them that would be created in Akamai’s cache:


URL1 : http://api.example.com/api/blog/100
CacheKey1 : /L/1/1/365d/api.example.com/api/blog?id=100


URL2        : http://api.example.com/api/blog/200
CacheKey2 : /L/1/1/365d/api.example.com/api/blog?id=200

  • GraphQL cache key example

GraphQL supports GET and POST verbs for requests, but the most common approach is to use a query string in a POST body. In this example, I’m querying data for a blog post with an id of 100 being requested via a URL of: http://api.example.com/graphql

The GraphQL query syntax looks like this:


query {
 Post(id: 100) {
   id     
   title
   author {
     id    
     name
   }
 }
}

Now imagine the same request, but where the id value is 200 instead of 100. The query would look like this: Post(id: 200), but the URL for the request would remain the same.

Even though these queries are different, HTTP caching would see them as identical because the URL itself has not changed. The cache key for both requests would be the same, and not deliver the expected cache outcome.

This second example illustrates that you need to use something else to construct the cache key, and the POST body is a natural choice. More specifically, a hash of the POST body can be used to form a cache key.

It's important to understand that GraphQL queries are often referred to as documents that contain tokens. Documents are comprised of two kinds of tokens: lexical and ignored.

  • A lexical token is a string with an assigned, defined meaning. Lexical tokens can include such items as field name, operation name, and argument names. Before and after every lexical token there may also be any amount of ignored tokens as well.

  • An ignored token is an unimportant presence or absence that does not change the query response.  Some examples of ignored tokens are: white space, line terminator, comment, comma, and Unicode byte order mark (BOM).

Using the raw POST body to form a cache key works well, but can cause other cascading problems. I’ll show you some examples where similar queries would result in a suboptimal, different cache key.

Duplicate Cache Keys

A duplicate cache key situation arises when there is more than one cache key storing the same result. This occurs when there are subtle differences in query strings for two different queries that result in different cache keys, but the same GraphQL response.

Duplicate cache key from ignored tokens

A request might use tabs or simple white spaces for indentation. If a caching server simply uses the query string as is to form a cache key, it will result in duplicate cache keys. The same applies to differences in comments, commas, and line terminators.

For example, the queries below differ only in the number of white spaces, line terminators, commas, and comments, but are otherwise identical.  


query  {
  Post(id :100)
  {
    id
    title      # This is example comment

  }
}


query  {
  Post(id :100)
  {
              id,
             title,   } }

Akamai filters out unimportant differences from those two queries so that the resulting query string used to calculate the cache key and corresponding cache keys look like this:


query{Post(id:100){id title}}

The edge then parses the query string, cleans it up, and the result is an identical cache key:


CacheKey : /D/1/1/000/198.18.86.158/ vcd=56 cid=_gql=553657d844dbe4b24e9b9cfe80da7bdc40505d1bdba718433e69b323a200a0c6

Duplicate cache key from fields and arguments ordering

If a standard HTTP caching server is used, the two queries below result in different cache keys, because the raw POST body is used for cache key calculation.


query  {
 Post(id :100, title : "GraphQL Blog")
 {
   title,
   id,
   published,
   author (id : 100){
     id,
     firstName,
   },
   createdAt,
 }
}


query  {
 Post(title : "GraphQL Blog", id :100)
 {
   title,
   published,
   id,
   createdAt,
   author (id : 100){
     firstName,
     id,
   },
 }
}

Alphabetically reordering fields and arguments and filtering out unimportant differences from query strings is referred to as canonicalization or normalization. After canonicalization of queries at the edge, the normalized query string looks the same for both the queries. Here is the canonical query string:


query{Post(id:100 title:"GraphQL Blog"){author(id:100){firstName id}createdAt id published title}}

And here is the corresponding cache key:


CacheKey: /D/1/1/000/198.18.86.158/ vcd=56 cid=_gql=76583b0087979af718b4eeb448a8fb812b1a676152500838e42637dec4551f94

Duplicate cache key from GET vs POST

If the same GraphQL query is requested via GET and POST, a typical HTTP server might create two separate cache keys based on the request type. For example, if you make a GET request: GET /graphql?query={id} and then make another POST request:

POST /graphql


{
 id
}

At Akamai, both requests result in the same cache key because the content of the GraphQL query is same - regardless of the HTTP method used.

The cache key for both requests should be:


CacheKey : /D/1/1/000/198.18.86.158/ vcd=56 cid=_gql=3eabb58d6d7f370bdd51d2f6ebef554e20a6f0507f063f75c5ffe495c4806bf0

GraphQL response parsing

A GraphQL response is typically served as an HTTP 200 OK message. Akamai parses the first 1K of a response body. If that first 1K contains “errors” list in the body, the resulting response is not cached.

Cache Keys and Purging

Akamai uses the Flexible Cache-Id feature to calculate the cache key for GraphQL requests. When Flexible Cache-Id is used the cache key is composed of two parts:

  • The existing cache-key minus the query string
  • The cache-id computed from sha256 of canonical query string

A GraphQL request cache key looks like this:


X-True-Cache-Key: /L/127.0.0.10/graphql vcd=60000 cid=___gql=fe06943db93e98a5c92d6440e85bb4c3c6480081691eaeecec8202699c26cb59

gql=<SHA265 of canonicalized query string>

The substring before cid= is the existing cache-key, anything that follows after cid= is called the cache-id.

Note the canonicalized query string is obtained after parsing the query, removing ignored tokens from the query and ordering the fields and arguments.  Variable substitutions, alias removal and directive processing are future enhancements that can also contribute to forming a canonical query.

After canonicalization, a SHA256 hash of the processed query string is added to Flexible Cache Id as part of the cache key. The use of Flexible Cache-Key also simplifies purging, as you only need to provide the URL to purge cache and wipe out all entries under the given URL.

Conclusion

I hope this has given you some insight into GraphQL and the work Akamai is doing with GraphQL today. This is just the start of more capabilities we are building around GraphQL at the edge. Stay tuned!

For additional information on GraphQL, see API Pain and GraphQL Relief and Cache GraphQL Responses to Increase Offload and Reduce Costs.

For additional information on cache keys, see Cache Keys: Why We Should Know Them.

Monika Rathor is a senior software engineer at Akamai Technologies.