Most of our 10k+ unit tests are written in Java using JUnit and run with the gradle build automation tool. The more tests we added over time, the more often we ran into the problem that unit test executions became unstable. New tests impacted the execution of existing tests. Our “failed test” metric for tests that ran fine for months started to increase. It was tempting to blame bad application code, but through careful analysis we found the real reason for these unstable test results. Many of these problems were caused by new tests that adversely impacted the test environment, and therefore also the execution of other tests.
In this blog post I will show you how we identified the root cause of these specific test failures as well as our derived best practices for unit test design with respect to its interaction with the test environment.
It all Began with Rare Unit Test Timeouts
Last week we identified a group of tests that failed for what seemed to be no specific reason. I volunteered to investigate.
A group of unit tests verifies important required behavior of a ThreadPool implementation. In particular, scheduled periodic tasks must continue to execute even after certain exceptions are thrown. To test this, a task is scheduled which throws the target exception. The test then waits for a second execution after the first one was aborted due to the exception.
On some machines and test runs, these tests would time out waiting for re-execution. No log output would be written, although various exception handlers were in place to log any exception. Only the message “Exception: java.lang.NullPointerException thrown from the UncaughtExceptionHandler in thread “pool-unit-test-thread-1″ was printed. However, this was never reproducible when executing the tests in Eclipse, only when running the test suite in gradle.
Next steps: I instructed gradle to open a debug port and then connected Eclipse to determine the cause. This revealed that the NullPointerException was generated somewhere in gradle code. I downloaded the source code and discovered that System.getProperty(“line.separator”)returned null and was dereferenced.
With this information, I searched and quickly found another test that verifies the string formatting on different platforms that had the side effect of changing the line.separator property. By calling System.clearProperty(“line.separator”)after the test, it inadvertently set that property to null.