Testing Concurrent Programs

Writing correct concurrent programs is harder than writing sequential ones. This is because the set of potential risks and failure modes is larger - anything that can go wrong in a sequential program can also go wrong in a concurrent one, and with concurrency comes additional hazards not present in sequential programs such as race conditions, data races, deadlocks, missed signals, and livelock.

Brian Goetz

Published: 01 Sep 2006

Writing correct concurrent programs is harder than writing sequential ones. This is because the set of potential risks and failure modes is larger - anything that can go wrong in a sequential program can also go wrong in a concurrent one, and with concurrency comes additional hazards not present in sequential programs such as race conditions, data races, deadlocks, missed signals, and livelock.

Testing concurrent programs is also harder than testing sequential ones. This is trivially true: tests for concurrent programs are themselves concurrent programs. But it is also true for another reason: the failure modes of concurrent programs are less predictable and repeatable than for sequential programs. Failures in sequential programs are deterministic; if a sequential program fails with a given set of inputs and initial state, it will fail every time. Failures in concurrent programs, on the other hand, tend to be rare probabilistic events.

Because of this, reproducing failures in concurrent programs can be maddeningly difficult. Not only might the failure be rare, and therefore not manifest itself frequently, but it might not occur at all in certain platform configurations, so that bug that happens daily at your customer's site might never happen at all in your test lab. Further, attempts to debug or monitor the program can introduce timing or synchronization artifacts that prevents the bug from appearing at all. As in Heisenberg's uncertainty principle, observing the state of the system may in fact change it.

So, given all this depressing news, how are we supposed to ensure that concurrent programs work properly? The same way we manage complexity in any other engineering endeavor - attempt to isolate the complexity.

Structuring programs to limit concurrent interactions

It is possible to write functioning programs entirely with public, static variables. Mind you, it's not a good idea, but it can be done - it's just harder, and more fragile. The value of encapsulation is that it makes it possible to analyze the behavior of a portion of a program without having to review the code for the entire program.

Similarly, by encapsulating concurrent interactions in a few places, such as workflow managers, resource pools, work queues, and other concurrent objects, it becomes simpler to analyze and test concurrent programs. Once the concurrent interactions are encapsulated, you can focus the majority of your testing efforts primarily on the concurrency mechanisms themselves.

Concurrency mechanisms, such as shared work queues, often act as conduits for moving objects from one thread to another. These mechanisms contain sufficient synchronization to protect the integrity of their internal data structures, but the objects being passed in and out belong to the application, not the work queue, and the application is responsible for the thread-safety of these objects. You can make these domain objects thread-safe (making them immutable is often the easiest and most reliable way to do so), but there is often another option: make them effectively immutability.

Effectively immutable objects are those which are not necessarily immutable by design - they may have mutable state - but which the program treats as if they were immutable after they are published where they might be accessed by other threads. In other words, once you put a mutable object into a shared data structure, where other threads might then have access to it, make sure that it is not modified again by any thread. The judicious use of Immutability and effective immutability limit the range of potentially incorrect concurrent actions by limiting mutability to a few core classes that can be strenuously unit-tested.

Listing 1 shows an example of how effective immutability can greatly simplify testing. The client code submits a request to a work manager, in this case an Executor, to factor a large number. The calculation is represented as a Callable<BigInteger[]>, and the Executor returns a Future<BigInteger[]> representing the calculation. The client code then waits on the Future for the result.

The FactorTask class is immutable, and therefore thread-safe, so no additional testing is required to prevent unwanted concurrent interactions. But FactorTask returns an array, and arrays are mutable. Shared mutable state needs to be guarded with synchronization, but because the application code is structured so that once the array of BigIntegers is returned by the FactorTask its contents are never modified, the client and task code can "piggyback" on the synchronization implicit in the Executor framework and do not need to provide additional synchronization when accessing the array of factors. If it were possible that any thread might modify the contents of the array of factors after it was created, this technique would not work.

    ExecutorService exec = ...
 class FactorTask implements Callable<BigInteger[]> {
        private final BigInteger number;

        public FactorTask(BigInteger number) {
            this.number = number;
        }

        public BigInteger[] call() throws Exception {
            return factorNumber(number);
        }
    }    

    Future<BigInteger[]> future = exec.submit(new FactorTask(number));
    // do some stuff
    BigInteger[] factors = future.get();

This technique can be combined with nearly all the concurrency mechanisms in the class library, including Executor, BlockingQueue, and ConcurrentMap - by only passing effectively immutable objects to these facilities (and returning effectively immutable objects from callbacks), you can avoid much of the complexity of creating and testing thread-safe classes.

Testing concurrent building blocks

Once you've isolated concurrent interactions to a handful of components, you can focus your testing efforts on those components. Since testing concurrent code is difficult, you should expect to spend more time designing and executing concurrent tests than you do for sequential ones.

The factors below are some of the concepts to consider when designing and running tests for concurrent classes. They, as well as others, are covered in much greater detail in the Testing chapter of Java Concurrency in Practice.

Tests are probabilistic. Because the failures you are searching for are probabilistic ones, your test cases are (at best) probabilistic as well. To give yourself the maximum chance of running across the right conditions to trigger a failure, you should plan to run concurrent tests for much longer.
Explore more of the state space. Running tests for a longer time is not going to find the problem if you are simply retrying the same inputs and the same initial state over and over again. You want to explore more of the state space, which with concurrent programs, includes temporal considerations as well. For example, if testing insertion and removal in a queue, you'll want to explore all the relative timings and orderings with which the two operations might be initiated.
Explore more interleavings. The scheduler may preempt a thread at any time, but most of the time short synchronized blocks will run to completion without preemption. This limits the likelihood that race conditions will be disclosed, as other (potentially undersynchronized) code is less likely to run while another thread is in the middle of a synchronized block. Tools like ConTest can randomly introduce yield() calls into synchronized blocks to explore more possible interleavings.
Match thread count to the platform. If you run as many threads as you have processors, threads will never be preempted by the scheduler, reducing the number of potential interactions between active and waiting threads. Similarly, if you run many more threads than you have processors, you reduce the number of potential interactions between active threads. Tailoring thread count so that the number of runnable threads at any time is a small multiple of the processor count will often result in a more interesting variety of interleavings.
Avoid introducing timing or synchronization artifacts. Tests for concurrent data structures often involve having some threads inserting elements while other threads remove them, and asserting things like "everything that went in came out", "nothing that didn't go in came out", and "everything came out in the right order." The obvious way to code such tests involves maintaining data structures shared across the test threads, which will themselves require synchronization. But if the test program does its own synchronization, it may perturb the timing or scheduling with which the component being tested runs, masking potential negative interactions.

All this sounds like a lot of work, and it is. But by limiting the scope of concurrent interactions to a few widely-used, well-tested components, you greatly reduce the amount of effort required to test an application. And, by reusing existing tested library components, such as the classes in the java.util.concurrent package, you further reduce your testing burden.

Testing Concurrent Programs

Structuring programs to limit concurrent interactions

Testing concurrent building blocks

Dig Deeper on Development tools for continuous software delivery

functional programming

Rust vs. C++: Differences and use cases

Concurrent programming in Go with channels and goroutines

Go programming language