Apr 4, 2024

The Best Way to Do GraphQL Error Tracking

Blog post's hero image

Handling errors is a core task when operating an API. It’s a heavily trodden path with many patterns, tools, and products.

Except if you’re using GraphQL.

Losing Context without HTTP Status Codes

When an error occurs for a REST request, the HTTP status code provides context before you even read the error message. For instance, with a 403 Forbidden, you instantly know there was an attempt to access a resource but access was denied. The error message then provides further insight, such as the requestor doesn’t have the proper role.

GraphQL is transport agnostic; it’s typically over HTTP, but it isn’t necessarily so. This means that GraphQL doesn’t rely on HTTP-specific features such as error codes. The GraphQL spec only requires an error message, so GraphQL almost always responds with a 200, even in the event of an error. Non-200 status codes are typically only returned when an error occurs at the HTTP layer. This impacts how our clients need to manage their error handling.

However, because GraphQL error messages often contain variadic data, they can be difficult to aggregate. For example, GraphQL error messages often include IDs for users or products. In non-GraphQL API servers, structured logging tools move variadic fields into metadata so that the message itself is consistent. To do this in GraphQL, you need to leverage error codes.

Code is an optional field that can be included in an error extension. Though some GraphQL server implementations include default error codes, they can be whatever you want. It’s up to your organization to define and use them consistently.

Stellate Error Aggregation

To provide context more quickly for errors, we’re introducing first-class support for error codes and error aggregation. If you’re already using error codes, there’s nothing needed on your end. If you’re not including error codes in your error extension, this is a great reason to start! Learn more about error extensions and codes in the GraphQL spec.

Let’s take a look at how aggregating errors by error code helps you obtain context more quickly.

Aggregated Error Page

The Errors page now defaults to grouping errors by error code, providing insight into error volume for a given time frame. Also included is the operation triggering the error and the error path.

Client Versions and User Distribution

For each aggregated error, you can view the error distribution across users and clients. This is powered by our new client versions and user features, which you can read more about here. This gives you insight into how widespread the error is across users and clients.

Error Event Log

For each aggregated error, you can dive into the individual error events to assess the state and specific error message of each one.

Improving the GraphQL Error Experience

We’re excited about our improved error experience, but there’s a lot more in store for the coming months. If there’s a specific pain you’re running into with errors, let us know! We’d love to hear about how we can help make it even better.