Improving our Jest execution time by 300%

Over the course of 2023, our large engineering team started working within a new React codebase using TypeScript. At first, everything was really fun and blazingly fast. But things started to slow down rapidly once the number of unit tests got above a certain level. At one point, in some cases, our test suite took over 500 seconds to run just a few hundred tests! Performance was abysmal. I took up the responsibility for doing some research to determine why we were having these issues with our tests, and how we could solve the issues without rewriting a bunch of our tests.

Our Test Setup

For testing our React components, we used the following configuration:

Jest testing framework
Mock service worker (msw) for intercepting API calls
React Testing Library
ts-jest transformer/preprocessor

Investigation

My investigation started with identifying the test suites which ran the longest. I quickly discovered that a handful of the test suites were running for upwards of 25 seconds. Some of these suites only contained 3 or 4 tests. This was immediately a red flag. Diving into these tests, I found that there was significant usage of the getByRole query. To validate this as the root cause, I wrapped these getByRole calls with a timer to determine the duration of their execution. Of course, these seemed to be the culprit. Initially, it was challenging to find others online experiencing similar issues. I was searching for issues with slow Jest performance when using getByRole. However, I was finally able to find what I was looking for when I started searching for issues related to React Testing Library instead. I found this issue opened against RTL way back in 2020 which is still open, addressing this performance concern. We found our first bottleneck, and confirmed it as a known issue.

Next, I investigated the performance and memory consumption of our Jest tests. I read this useful article discussing Jest memory leaks, and decided to diagnose our test suites to determine whether we had memory leaks too. The command from the article did the trick:

node --expose-gc ./node_modules/.bin/jest --runInBand --logHeapUsage --silent

I found that we had massive memory leaks! Our heap grew to 3GB in size before the runner ran out of memory. It was horrible. There’s problem number two!

Finally, I did some research on the transformer we were using, ts-jest. One interesting thing about ts-jest is that part of the transformation process is compiling TypeScript types before running tests. This added a lot of time to our tests. The ts-jest transformer has an isolatedModules option which, when set to true, disables type checking.

Results

GetByRole Optimization

As mentioned before, it would be a massive undertaking to remove all use of getByRole from our test suites. We opted to accept our current tests as they are, but put out an advisory to no longer use getByRole queries and instead use alternative methods for selecting DOM elements such as getByLabel, getByText, or getByTestId. Label and text selectors are preferred, as retrieving by hidden attributes doesn’t reflect the actual customer experience very well. For this discovery, we’re preventing the fire from spreading more than anything.

Memory Leaks!

Many tests were unknowingly creating async calls that were not being intercepted and handled by msw or were not being properly awaited. These async operations were leaving behind resources that were not being properly cleaned up, leading to significant memory leaks over time.

The effect of this was that our tests were starting to time out when being run in parallel, and our runners in our deployment pipeline began to run out of memory before the tests could finish running, causing our deployment pipeline to get blocked.

Similar to the effort limits of the GetByRole problem, finding and fixing all of the memory leaks within all of our test suites would be far too much effort. The long-term solution is to resolve these memory leaks one by one. The short-term solution was to cap the worker idle memory limit in our Jest config to 512mb:

workerIdleMemoryLimit: '512MB'

This setting checks the memory usage of a test after it has been completed, and kills/restarts the worker if it exceeds the specified limit. It prevents heap utilization from growing out of control, but it’s really only a bandaid solution. In addition to this, we reconfigured the runners in our deployment pipeline to run tests in band (not parallel) to guarantee that each test has enough memory to execute. This approach decelerates our testing in the deployment pipeline; however, developers can still execute tests in parallel on local machines, leveraging the superior memory capacities of our MacBook Pros.

Running Jest Without Type Checking

This change, by far, had the most significant impact on our test’s execution time. The justification for this change was that we were type checking everywhere around our tests, so there’s no reason to also have type checking enabled for our tests:

Build command is transpiling TypeScript, it’s already doing type checking.
Dev IDEs/editors will detect type mismatches and highlight them.
Pre-commit hook builds the package, so it’s type checking as a guard before committing anything.
We have a code review automation which runs a release build script to check that the code being checked in will build properly. If it fails, it blocks the dev from merging the code.
If that isn’t enough, the same release build is part of our deployment pipeline. A build failure will block the pipeline and never make it to prod.

So we disabled type checking. The result. A 300% improvement in test execution time. The test suite that once took over 500 seconds to run all tests, was now running in 150 seconds!

Based on a likely scenario of 5 developers running the test suite twice a day we estimated that over the course of a year, these changes will save approximately 253 developer hours, or approximately 31.625 work days (8 hours per day). This put over 3 sprints of time back in the hands of the developers!

350 * 2 * 5 = 3500 seconds per day
3500 * 5 * 52 = 910,000 seconds per year (work week only)
910,000 / 60 / 60 = 252.7 hours per year
252.7 / 8 = 31.58 work days per year

Learnings

Some things I took away from this:

It’s worth investing in learning your libaries from the start. Writing tests slightly incorrectly will have a snowball effect later on, which can significantly slow down your ability to iterate.
Profile your test suite frequently enough to catch memory leaks early on! It’s a lot easier to fix a handful of tests, but nearly impossible to prioritize fixing hundreds!
Don’t type check your tests if everything around your tests is already doing type checking.
Small inefficiencies add up. A test suite running for 500 seconds instead of 150 seconds is significant over the course of a year. It wastes a TON of time. Yes, you could just pivot to another task. But context switching is extremely expensive and kills productivity. This was worth the optimization.