May 21, 2024

Partial Query Caching: Splitting, explained

Blog post's hero image

With partial query caching (PQC) being generally available, we are excited to dig more into the technical details of the implementation and showcase some of our reasoning on how and why we built it the way we did.

In our initial announcement, we discussed how PQC is all about optimizing cache hit rate by leveraging the structured and typed nature of GraphQL to decide how to cache your data more effectively. At the core of PQC is what we generally refer to as “splitting a query”. In this process, we take apart the incoming query and identify and group parts of a query that belong to the same cache entry based on the configuration for types and fields of your service – these are called the query splits.

Of course, details matter, and there’s a lot going on in PQC to make everything work as conveniently and safely as we want it to be. However, the query splits are the crucial pieces of information PQC revolves around, so in this blog post we want to take a closer look at how we split a query.

PQC Overview

Let’s start with a general overview of the PQC flow. First, we split the query (more on that in a second), then take the resulting splits and try to retrieve their data from the cache. The cache key is composed of multiple parts, for example the split query document and the variables of the query joined and stringified, which is then hashed with other relevant information to form the final key.

All splits we either can’t find in the cache or which we know we can’t cache will be merged to a single document, which we then send to the origin to retrieve everything we need to complete the query. In general, we strive to only ever send at most one query to your origin, but if you’ve read our previous post on safe list handling, then you might remember that there are cases where we need to send a second query for safety.

In the last step, we extract the parts each split cares about from the response data and then store those in the cache. Finally, having all the data we need, we combine the cached data with the fresh data and send it back to you.

Splits – The What And Why

First, let’s get be clear on what a split actually is and why we want to do it. Note that the examples will use field names to indicate their lifetime (no-, zero-, low-, high-), so we can see differences at a glance and don’t need to keep track of exact numbers.

Consider the following example query:

query { rootField { lowMaxAge highMaxAge } }

Both fields have different max-ages, in PQC this means that we want to put these in separate splits, which means that they both should be in separate cache entries for optimal cache hit rate:

# Split 1
query { rootField { lowMaxAge } }
# Split 2
query { rootField { highMaxAge } }

More precisely, these splits represent the parts of the response data that should end up in different cache entries.

The impact of doing this is intuitive: If we don’t split, we could only cache the entire document with the lowest common denominator we find in the query – in this case lowMaxAge – which means that this query is not optimally cached, as part of the time highMaxAge could reside in the cache is wasted. An extreme case is having an uncacheable field in the document, which would cause the entire document to be uncacheable, which is precisely what our document cache does. In a way, document caching is PQC where we always do exactly one split, which is the entire document.

Splitting in detail

So far we’ve only talked about splitting based on max-age, but the reality is a little more complicated, as we’re actually looking at three dimensions in total:

  • Max-age: How long a cache entry is considered fresh.

  • Stale-while-revalidate (swr): How long an expired cache entry (ie. an entry with an age past the one defined in max-age) can still be served while the cache is refreshing (revalidating) the data.

  • Scopes: In short, scopes are a mechanism we offer for putting cache entries into buckets based on rules. A common example being “only authenticated users can access this entry” or “only user x can see their personalized list of recommendations”. If you’re already versed in the workings of HTTP caches, it’s conceptually how Vary works, applied to GraphQL query caching.

If you’ve used Stellate before, you might recognize the above information from the service configuration, in which you can define caching rules for types and fields. In a perfectly covered schema, meaning all three split key dimensions are explicitly defined by the configuration for all types and fields, splitting would be (mostly) straight-forward, since we could bucket all selections based on the complete information given. However, it’s rare to have a configuration that fully specifies rules for each and every type and field. In general, we learned that for most customers it’s fine that the parts of the query that are not covered by explicit rules are cached if possible. They’re generally much more interested in:

  • Creating rules for the data they care most about (computationally- or bandwidth-heavy parts of the query).

  • Marking the data they absolutely do not want cached as uncacheable.

… and then letting us figure out the best way to go from there.

The above means that the query splitting must work with incomplete rule information about the query. But that’s not all, since there’s also schema information to be considered. As opposed to document caching, which works without schema information as it extracts the necessary information from the response by looking at the __typename fields, PQC makes decisions based on the request, which means that in order to make any sense of the incoming query we require a schema. Stellate sits in front of your GraphQL endpoint, so it’s inevitable that by the very nature of distributed systems there will be times where the schema we know and the schema your origin server has will differ. Given a sufficiently high volume of requests, this will mean that even if it’s only a few seconds of schema drift, it can and will affect a sizeable amount of requests. In turn, this means that splitting must be able to deal efficiently with schema differences on top of having incomplete rule information.

With all of these challenges in mind, let’s take a look at how we came up with a set of rules to power the splitting algorithm.

Scopes

Scopes are straight forward: They are always inherited top down in the document tree, which means that a selection always inherits all scopes found in the entire path up until that point. This makes absolutely sure that we do not leak data accidentally.

query {
rootField { # New: ROOT_SCOPE
scopeAField # Inherit: ROOT_SCOPE, new: SCOPE_A
scopeBField { # Inherit: ROOT_SCOPE, new: SCOPE_B
nestedField # Inherit: ROOT_SCOPE, SCOPE_B
}
}
}

You can see that the root scope is propagated everywhere and that sibling selections are not inheriting each others scopes (e.g. FIELD_B_SCOPE is only on fieldB , not fieldA ).

Max-age & SWR

Determining cache lifetimes is where it gets interesting, since heuristics and tradeoffs need to be considered: Scopes are additive in nature (e.g. sum of all scopes in the path to a selection), TTLs are independent of each other and more crucially can overwrite each other based on precedence.

For simplicity, the following will only talk about max-age, but the rules showcased are identical for determining swr as well.

Let’s start out simple. If a leaf field has an explicit max-age set, it has precedence over anything else:

query { lowMaxAge highMaxAge }
# Split 1. "low"
query { lowMaxAge }
# Split 2. "high"
query { highMaxAge }

“Anything else” is for example the max-age defined on the enclosing type of a field (note the added lowMaxAge indirection):

query { lowMaxAge { lowMaxAge highMaxAge } }
# Split 1. "low"
query { lowMaxAge { lowMaxAge } }
# Split 2. "high"
query { lowMaxAge { highMaxAge } }

However, if the enclosing type has a max-age and a field has none, then it “inherits” the max-age:

query { lowMaxAge { noMaxAge lowMaxAge } }
# Split 1. "low"
query { lowMaxAge { noMaxAge } }
# Split 2. "high"
query { lowMaxAge { highMaxAge } }

This is especially useful for a common and safe adoption path we optimize for, where the query root is uncacheable, which will then trickle down to everything by default. If something should be cached, then specific rules are added for the field or type, which then that act as a reset, marking the subtree as cacheable:

query { # Query is uncacheable
noMaxAgeOne { noMaxAge lowMaxAge }
noMaxAgeTwo { lowMaxAge { noMaxAge highMaxAge } }
}
# Split 1. "low"
query {
noMaxAgeOne { lowMaxAge }
noMaxAgeTwo { lowMaxAge { noMaxAge } }
}
# Split 2. "high"
query { noMaxAgeTwo { lowMaxAge { noMaxAge } }}
# Split 3. "uncacheable"
query { noMaxAgeOne { noMaxAge } }}

So far, so good. What if neither the enclosing type, nor the field have a max-age?

query { noMaxAge { noMaxAge } }

Well, this specific case actually isn’t cacheable, since we have absolutely no information to go off of and can’t determine a max-age at all.

What about Fragments?

So far, we’ve only looked at simple queries. Out in the wild, fragments are widely used, as they’re a common way for client frameworks to compose queries based on the needs of individual UI components, so we need to think about how to handle them. Specifically, fragment spreads require some thought, as the way they’re split depends on the location they’re spread:

query {
lowMaxAge { ...Fields highMaxAge } # SCOPE_A
highMaxAge { ...Fields lowMaxAge } # SCOPE_B
}
fragment Fields on X {
noMaxAge
zeroMaxAge
}

In the above example, Fields.noMaxAge either belongs into the lowMaxAge or highMaxAge bucket, depending on the location it’s spread, while Fields.zeroMaxAge will always belong into the uncacheable bucket. Additionally, any scopes that the path to the spread defines must be honored to prevent cache leaks. Our solution is to inline all fragment spreads:

query {
lowMaxAge { ...Fields highMaxAge } # SCOPE_A
highMaxAge { ...Fields lowMaxAge } # SCOPE_B
}
fragment Fields on X {
noMaxAge
zeroMaxAge
}
# Becomes
query {
lowMaxAge { # SCOPE_A
... on X { noMaxAge zeroMaxAge }
highMaxAge
}
highMaxAge { # SCOPE_B
... on X { noMaxAge zeroMaxAge }
lowMaxAge
}
}

By doing this, we can just follow the rules laid out earlier to split selections sets, without any special considerations for fragment spreads - scopes are naturally flowing downwards and TTLs are inherited as explained above.

What About Abstract Selections?

Indeed, abstract selections like unions or interfaces are a challenge for query splitting! Rest assured that we handle those gracefully, but the details of splitting abstract selections merit a follow-up blog post, so stay tuned!