Moore’s Law and Python’s flawed logic
When language architects designed Python, they couldn’t conceive of a world where computers had more than one core.
In the 1980s and 90s, software engineers bet heavily on Moore’s Law, which asserts that the number of transistors on an integrated circuit would double every two years. The corollary is that speed would too.
As this rice on a chessboard scenario is taken to its logical conclusion, effectively infinite CPU speeds would be achieved in a relatively short period of time.
Guided by this logic, Python inventor Guido van Rossum architected all of Python’s multithreading capabilities around the fatally flawed assumption that computers of the future would have only a single, cheap, infinitely fast CPU.
Python’s Global Interpreter Lock (GIL) was built assuming all operations would occur on this single core. Computers, phones or micro-devices with more than one CPU simply wasn’t a consideration.
Of course, increasingly fast CPUs was not the direction computer hardware took. Instead of faster CPUs, chip designers made CPUs smaller and embedded multiple cores on a chip.
“Suddenly there was all this pressure about ‘do things in parallel’, and that’s where the solution we had in Python didn’t work,” said Python creator Guido van Rossum on the Lex Fridman podcast. “That’s the moment the GIL became infamous.”
That’s why, more than 30 years after Python was invented, a multithreaded Python app running on an enterprise server with 256 cores will leave 255 of those cores sitting idle.
Python can’t thread across cores.
Python apps can do a multithreading. It’s just that those threads can’t run across cores. It all happens on a single, solitary CPU, no matter how many CPUs exist in the system.
Concurrency in Python
Python takes a very unique approach to the concept of concurrency and parallelism.
Multithreaded python programs don’t perform true parallel computing. Instead, multithreaded Python applications just create the illusion of parallelism.
To achieve what looks like parallelism, Python schedules a thread to run for a few CPU cycles. It then interrupts that thread to allow another thread to run.
Multiple Python threads divvy up all the available scheduling slots on a single CPU. If the processor is fast enough, this creates the sensation that multiple processes are running in parallel, even though everything is serial.
Speaking on the logic that went into Python’s multithreading libraries when they were created, van Rossum says the idea was that “We’ll provide something that looks like threads, and as long as you only have a single CPU on your computer, which most computers at the time did, it feels just like threads.”
Python parallelism is an illusion
Python supports multithreading, but those threads all run serially on the same CPU. Nothing happens in parallel. All Python applications are CPU bound.
The inability to thread across cores is a fundamental flaw in the architecture of the Python platform that was permanently baked into the the platform when the GIL was invented.
Developers have been working for thirty years trying to fix the GIL to no avail.
Even an embarrassing backwards compatibility break between Python 2 and Python 3 wasn’t enough to address the issue. It’s a fundamental flaw that Python may never be able to fix.
Architects for more modern programming languages like Java and C# understood from the days of their inception that the future of computer hardware was multi-core machines.
From day one, long before its official release in 1996, the Java language architects provided APIs that allowed programs to thread across an infinite number of cores. What was inconceivable to the inventors of Python was one of the core concepts around which the Java Virtual Machine was designed.
Every modern programming language supports threading across cores. Python likely never will.
How do you fix Python’s GIL?
So what can Python developers do to address the flawed GIL and Python’s multithreading mistake? Not much.
Thousands of hours and millions of dollars have been invested in the creation of various incompatible Python-like libraries that try to address the mistakes of Python’s past.
These expensive endeavors have led to a great deal of fragmentation in the world of Python, where many Python applications deployed to production or used to run language models wouldn’t actually run on a standard installation of Python.
The fragmentation has actually created a real existential crisis in the Python community. After all, if the Python code you write is incompatible with the Python code other people write, and everyone’s code will fail to run on the standard Python platform, can you really call any of the incompatible code people are writing ‘Python?’
Several projects exist that replace the GIL with a multithreaded core, but those projects actually make single threaded Python apps run slower, which is a real problem given the fact that Python applications already run 500%-1000% slower that a similarly coded Java application.
Python recently introduced PEP 703, a proposal to make the GIL optional in future Python releases, but unclear if that could be implemented without severely damaging Python’s interoperability with it’s most popular, third-party libraries.
It’s unlikely libraries written for the standard GIL would work when integrated into a multi-threaded system. Removing the GIL from CPython would likely cause further fragmentation in an already heavily fragmented landscape.
Python and Project Mojo
One of the newest attempts to put an end to industry fragmentation caused by the proliferation of a confusing assortment of new, incompatible flavors or Python is another new flavor of Python named Mojo.
Project Mojo, spearheaded by some of the greatest minds in the Python community, is a superset of Python.
Can Mojo fix Python?
This promising new project hopes to:
- Add multi-core multithreading to Python
- Add true strong typing capabilities to Python
- Greatly improve the performance of Python
Fixing Python is a daunting task, and nobody knows if the project will actually be successful. But if everything goes according to plan, by 2026, through Project Mojo, Python developers will have all of the features, functionality and runtime performance that Java devs had when the JDK was released in 1996.
Python vs Java
If you’re a Python dev who can’t wait until 2026 for strong typing and multi-core threading support, there are other options.
One of Python’s strengths is its ability to invoke code written in newer, more modern programming languages.
Most of the big data science libraries written in Python are just lightweight wrappers that call other languages like Julia or C under the covers. It’s those other languages that do the real work.
If you are a software engineer and you do need true multithreading capabilities in your Python programs, you could write your threaded apps in Java, and then just have the Python code invoke the Java code when it’s time to do some heavy lifting.
Or you could just write your programs in Java to begin with. That might save you a lot of headaches.