How Python multiprocessing can boost performance
Python is a highly concise and expressive language that enables developers to accomplish complex tasks with very clear, minimal code. Its rich standard library, dynamic typing and intuitive syntax make it a favorite for rapid development and prototyping.
Python’s main drawback shows in CPU-bound processes, where the Global Interpreter Lock (GIL) prevents true parallel execution of threads. However, this limitation can be overcome using the older technique of multiprocessing.
What about multithreading and async?
Multithreading, contrary to popular belief, does exist in Python, as does the ability to run async code. They are useful for certain classes of problems where the main problem is IO-bound. Examples include Web services that download or send files, where async code can hand off the IO and continue.
However, because the GIL prevents the interpreter from handling more than one instruction at a time, CPU-bound tasks show no benefit from this. Paradoxically, they can actually become slower by using a multithreaded or async approach in Python.
How Python multiprocessing works
Multiprocessing in Python runs work in separate interpreter processes, so each process gets its own Python interpreter and thus its own GIL. That means CPU-bound tasks can truly run in parallel across cores, unlike multithreading where the GIL serializes Python bytecode and threads mainly help with I/O.
There are some important downsides to this approach:
- Processes have higher startup and memory overhead
- Inter-process communication (IPC) is much slower than in-process sharing
- Objects must be pickled (serialized) to cross process boundaries.
Processes have a much higher overhead than threads, but it’s the tool you have in Python for CPU-bound tasks.
Spawning a process is as simple as creating a multiprocessing.Process(target=fn, args=…) and calling start(). On Windows and macOS the default “spawn” start method launches a fresh interpreter and imports your module, so you must protect entry points with the construct if __name__ == “__main__”: and sometimes call freeze_support() on Windows.
Stopping a process cleanly is best done cooperatively. Have the child check an Event, a Queue, or a work iterator that ends, then exit and let the parent join()it. If you must bail out, terminate() sends a hard stop, although this may skip finally blocks and leave shared resources messy. Better is graceful signaling and time-bounded join(timeout) before resorting to termination. Join() on the parent process is roughly analogous to await in an async world with which most programmers are familiar.
Process pools (multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor) are a higher-level implementation of processes that manage a fixed set of worker processes and queue tasks to them, saving the cost of repeatedly spawning. They expose convenient API functions such as map, starmap, submit and as_completed, and let you control max_workers, chunksize (to reduce scheduling overhead for many tiny tasks), initializer/initargs (to set up heavy state once) and maxtasksperchild (to mitigate leaks in long runs).
Keep payloads small and pickle-friendly, avoid global mutable state and batch work to limit IPC overhead cost.
Inter-process communication
In Python’s multiprocessing model, each process runs in its own independent memory space This means variables, objects and state are not shared directly between them as they are in multithreading. This isolation enables processes to bypass the GIL and run truly in parallel.
That isolation also means those processes cannot naturally read or modify each other’s data. Inter-process communication (IPC) becomes necessary to coordinate work, share results and exchange information between these separate memory spaces.
Python provides several IPC mechanisms, such as Queue, Pipe, Value, Array and shared memory objects. While this extra step introduces some overhead, it’s the essential bridge that enables collaboration in a multiprocessing setup.
Pickling
Pickling is the process of serializing an object, or converting it into a byte stream that can be stored or transmitted, and later reconstructing it (unpickling) back into a live object. In multiprocessing, this is critical because data sent between processes must pass through the operating system’s IPC mechanisms, which can only handle raw bytes.
Python uses pickling to transform objects into this transferable form before sending them to another process, and then unpickles them on the receiving end. This is transparent to the developer, who simply provides the objects to send.
As a result, anything sent through a Queue, Pipe or other IPC channel must be “pickleable” meaning it can be fully represented in serialized form. Certain objects, such as open file handles, sockets or lambda functions, cannot be pickled directly and attempting to send them between processes will trigger errors. This requirement will shape how your multiprocessing code is structured, requiring you to design data in a form that can be cleanly serialized.
Queues and Pipes
Queues and pipes are the primary IPC tools in the multiprocessing module, designed to safely pass data between processes.
A Queue is a thread and process safe FIFO structure built on top of a pipe and locks, which enables multiple producers and consumers to send and receive messages without worrying about race conditions. It is simple to use for task distribution, result collection and other scenarios where order matters.
A Pipe, on the other hand, is a lower level, two-way communication channel between exactly two endpoints. It is simple, fast and lightweight, but requires the programmer to manage concurrency manually if accessed from multiple processes. Both tools serialize (pickle) data transparently before sending, so they can transmit complex Python objects. However, they differ in abstraction level: Queues are higher-level and easier to use, while Pipes are more bare bones and efficient for direct two-party communication.
Shared objects and managers
Managers provide a way for processes to share and manipulate Python objects without manually handling low-level synchronization or serialization.
A multiprocessing.Manager() starts a special server process that holds the actual object and gives each worker process a proxy to it. These proxies automatically handle all IPC, pickling and synchronization, so from the developer’s perspective the object behaves much like a normal Python object even though it exists in a separate process.
Managers are especially useful to share state across many processes without designing custom message-passing logic. However, because every access goes through a proxy and involves IPC, managers can be slower than using local data or lower-level shared memory (value/array) for high-frequency updates. They are most useful in cases where convenience, flexibility and simplicity outweigh raw speed, such as maintaining a shared cache, registry or coordination structure across multiple worker processes.
Conclusion
In general, many consider CPU-bound tasks a bane of Python, arguing that it cannot handle them efficiently. However, if you think a little outside the box and design your data to be cleanly serializable, this is not the case. Multiprocessing, despite being more heavyweight than threads, can be used to efficiently handle the workload.
David “Walker” Aldridge is a programmer with 40 years of experience in multiple languages and remote programming. He is also an experienced systems admin and infosec blue team member with interest in retrocomputing.