May 22, 2024

Integration testing Fastly services with Viceroy

In this post we will cover how we recently made our Fastly integration tests run ≈10x faster while being easier to write. We are now able to run hundreds of integration tests in 5 seconds locally! This has dramatically improved the productivity of our Rust engineers. We hope other Fastly customers can benefit from our learnings when testing their Compute@Edge services.

Testing has always been very important to us at Stellate. Our customers entrust us with their mission-critical traffic and we take this responsibility very seriously. We use a mix of testing strategies throughout development to validate changes and prevent issues in production. For the purpose of this post the focus will be on integration and end to end tests.

This blog post goes into a fair amount of detail, it is broken up in a few sections:

Problem: Flaky, slow and complex end-2-end tests were slowing us down.
Solution: How we built a fast and reliable integration test setup for Fastly.
Benefits: How did we go about simulating semi-realistic traffic.
Next steps: How you can get the same benefits.

Flaky, slow and complex end-2-end tests were slowing us down

Stellate’s core product is a GraphQL CDN. It uses Fastly’s WASM edge workers to cache GraphQL requests. There were many benefits of building Stellate using edge compute in 2021, but one of the drawbacks was the difficulty in testing our system. Since edge compute was relatively new, testing workflows were not yet fully mature. There was no off the shelf local development environment for our setup so we had to deploy our code to dev to test it.

Fastly did offer a local runtime called Viceroy, but it lacked support for caching, among other things that were critical for our use case. As a consequence we were very limited in the types of tests we could write. Where we would have liked to write integration tests we were often forced to write end to end tests. Further, our end to end tests had to run against a deployed version of our code. This had a number of drawbacks:

Test execution was error prone and flaky: Any fluctuations in internet weather, intermittent service outages or read-write races from service setups immediately caused failures that were sometimes hard to reason about or required constant fine tuning of timeouts and retries. The value of tests is greatly diminished when you can’t rely on failing test meaning functionality is broken. Not only did we spend a lot of time investigate false positives, but we also risked missing regressions and ascribing them to a flaky test run.
Test execution was slow:
Setting up and running all tests could take upwards of 15 minutes! One reasons for this was that we had to include wait times in tests to ensure writes to Fastly propagated before making assertions. Further, parallel execution of tests would often lead to data races and complicated bugs so we opted to run tests one-by-one.
Test execution wasn’t feasible on pull-requests:
Fastly only allows us to have a limited amount of services, so the only option would have been to distribute test runs over a fixed amount of pre-configured test compute services. Even given a moderate commit volume, this would lead to considerable wait times for PRs. Additionally, we would require mechanisms to synchronize different PRs so that they don’t start overwriting each other as a commit is made on one PR while tests are running for another PR. Instead we opted for the CI to only run our end to end tests on or main branch. Developers could still manually trigger tests runs locally for their PRs against one of the two designated environments. Developers would announce on slack when they were making a test run to prevent simultaneous usage. This allowed us to catch most errors before they hit the main branch, but the clunky process made it easy to forget or skip and it was not uncommon for issues to show up after merging a PR. Faulty code wouldn’t hit production, but as you can imagine, it was a huge inconvenience.
Authoring tests was slow and cumbersome: Because Stellate is a CDN that proxies other services our test CDN service has to point to origin services. During testing that meant spinning up fake origin services for the purpose of testing. Instead of just writing a test that simply states “using this config, assuming this data is returned from the origin. run the CDN”, we often had to fiddle around with existing test services to either approximate the conditions we want or we needed to go through a whole process of deploying a new test service to Vercel in order to finally consume the data, which is just noise in the process.

How we built a fast and reliable integration test setup for Fastly

None of the above problems are unsolvable and many could have been solved with enough engineering effort poured into it. However, the complexity and maintenance effort was already substantial and adding more complexity would only make it worse. With that in mind we reviewed our options for building integration tests that could be run locally.

As it turns out, Fastly’s local runtime, Viceroy, isn’t merely a local executable runtime. In fact, it’s an open-source project that contains both a CLI and a library component written in Rust. This sparked the idea of building a Fastly compute test harness using just Rust-native facilities (cargo test), which are well integrated in editors (eg. VSCode’s ”run test”). Looking at the code, it became clear that not only could we intercept all outputs of the CDN service easily, but we could also control all inputs into the system, creating the perfect basis for writing precise CDN tests that:

Run completely in memory, requiring zero network calls to origins or external services.
Have perfect isolation, as there’s no shared state between them.

However, it required work from our side to expand Viceroy to offer all the things that we need for the above to happen. We did just that!

Viceroy is essentially a wrapper around WASMtime (a popular WASM runtime) that implements the bindings the Compute@Edge SDK uses under the hood to run edge workers. It’s a well-structured project with many useful code comments and generally readable code, so we were able to add all the functionality we needed to our own fork of the project. From the PR:

Cache API support. Allows keeping states in between invocations. No transaction support, plus some calls unused by Stellate are not yet implemented. Very simplistic cache implementation that is not really suited for long-running processes, but works for us.
Adds the ability to handle requests in-memory, instead of reaching out to an origin via HTTP. This is done by adding an in-memory handler to Backend's to which Request's will be passed.
Adds the ability to intercept dynamic backend registrations and traffic, which in combination with in-memory backend handlers now allows for in-memory handling of dynamic backend requests.
Adds the ability to read Endpoint (log sink) messages programmatically by having them write to an in-memory buffer that can be read from outside of the execution context after an invocation. Technically this could have been intercepted from stdout before, but this is much more convenient.
Adds the ability to wait for guest code invocations to fully complete, which is necessary to surface execution errors after an initial response has been sent to client (especially true if streams are processed and panics happen after the headers have been sent back to the client already).

The end result was a simple test harness which allowed us to control the inputs on a test-by-test basis and allow to assert on all possible system outputs. Here’s an example:

#[test]
fn fancy_test() {
		// Sets up storage with configuration for a known test service.
		// Adding a new service with a different schema is just writing a few lines.
    let mut test_runner = TestRunner::new(todo_service());
    let query = json!({"query": "query { todos { id title __typename }}"});

    // Specify exactly what data is returned from the origin.
    // Allows us to easily recreate specific conditions without the need to deploy an origin anywhere.
    // Opens up the possibility to assert system behavior on faulty responses.
    let origin_response = OriginResponse::new_json(&json!({ "data": { "todos": [
        { "id": "1", "title": "title1", "__typename": "Todo" },
        { "id": "2", "title": "title2", "__typename": "Todo" },
    ]} }));

    let response = test_runner.gql_request(&query, Some(&origin_response), vec![]);

    // We can inspect the cache state after each invocation (technically, we can also modify it).
    test_runner.pretty_print_cache_state();

    // We can implement any kind of assertions we want and since the worker is written in Rust,
    // we can now share and reuse code for assertions.
    response.assert_cache_state(CacheState::Miss);

    let response = test_runner.gql_request(&query, [], vec![]);
    response.assert_cache_state(CacheState::Hit);

		// Intercept traffic and act on it. Here, our automatic purging usually reached out
		// to the Fastly API, but we can just simulate purges easily with Viceroy.
    let mutation = json!({"query": "mutation { deleteTodo(id: 1) { id }}"});
    let mutation_origin_response = OriginResponse::new_json(&json!({ "data": { "deleteTodo": [
        { "id": "1", "title": "title1", "__typename": "Todo" },
    ]} }));

    let response = test_runner.gql_request(&mutation, Some(&mutation_origin_response), vec![]);
    response.assert_cache_state(CacheState::Pass);

    // Asserting on logs of any kind works. Each response stores their individual invocation logs.
    // This opens up the possibility.
    let purge_logs = response.get_logs_json(KAFKA_PURGE_LOG_ENDPOINT);
    assert!(!purge_logs.is_empty());

    let response = test_runner.gql_request(&query, Some(&origin_response), vec![]);
    response.assert_cache_state(CacheState::Miss);
}

Everything you see above runs in memory. The first version took only a few second to run, so we did an experiment and duplicated the test 200 times. The result was a little disappointing. While we achieved our goal of creating a proof-of-concept that we can execute everything in memory using a simple, PR-capable test harness, we hoped for better execution times than the 4 minutes for the 200 tests. After all, everything written in Rust must be blazingly fast (/s)! Tracing the hot code paths and digging around in the Viceroy internals, we found out that initializing the execution context (which is basically the interface of invoking Viceroy) always goes through an expensive linking and prepartion process for the used WASM binary. Luckily, the rest of the required configuration is parametrized on the fly based on the initialized context, so all we had to do was initialize once:

lazy_static! {
    pub static ref BASE_CTX: ExecuteCtx = {
        ExecuteCtx::new(
            format!("path/to/test-binary.wasm"),
            ProfilingStrategy::None,
            HashSet::new(),
            None,
            Default::default(),
        )
        .unwrap()
    };
}

…and then clone and add config as needed on every run:

// In the `TestRunner`:
// ...
let ctx = BASE_CTX
	.clone()
	.with_object_stores(self.kv_stores)
	.with_endpoints(self.endpoints)
	.with_backends(default_static_backends(self.cache))
	.with_cache(self.cache))
	.with_dictionaries(self.config_stores)
	.with_log_stdout(true);
// ...

Suddenly, we were executing 200 tests, each doing 4 invocations, in under 5 seconds on a M1 Max MacBook. Even on CI, test execution is less than 20 seconds, with the majority of time spend compiling the tests (~2 min on a regular GitHub runner).

Benefits

Most if the critical end to end tests that were better suited as integration tests have been ported to our new integration testing framework. The two original goals for the projects have been met:

Integration tests can run on PRs: Our feedback cycles are now considerably faster. We can quickly iterate locally and verify our changes in <5 seconds instead of the previous ≈15 minutes. And should we forget to run tests before committing the CI will catch any issues before our changes make it to the main branch.
Reliable releases: During deployments it is now considerably rarer to have re-run end-2-end tests multiple times until you get a successful run 🎉 We haven’t completely eliminated end-2-end tests as it’s still important to verify the system end to end in a production like environment. But the reliability of these tests have been greatly improved by removing tests that were better suited for integration testing.

But there has also been unexpected benefits:

More testing: We have started naturally writing more tests during development as they are easier to setup. Further, we are writing these tests earlier in the process which is bound to increase our iteration speed.
Latent bugs uncovered: While porting tests to the new integration testing framework we uncovered two minor latent bugs in edge case behaviour for our CDN. These bugs are unlikely to have been found had it not been for the flexibility of the new system.

As Thomas from our team put it the other day:

Boy am I glad for the new integration tests! I just fixed a bug with JWT scope handling and I’m pretty sure I wouldn’t have spotted it if it wasn’t for the new resilient test setup. - Thomas Heyenbrock

Whats next?

We will continue to move tests over to the new system and are of-cause looking out for the next area of improvement.

We’re working with Fastly to bring these improvements to all Compute@Edge users so that you too can benefit from this work. Until then you can find our experimental Viceroy changes here. If you have further questions on how to implement this your self feel free to reach out.

We hope that this glimpse into our journey was insightful and maybe inspires you to improve your own integration tests.