Olivier Le Moal - stock.adobe.co


How to address Python performance problems

Python is a great language for nonprogrammers to do mathematical and scientific tasks, even if such optimization impacts performance. Here's why that's not a fair criticism.

Python often gets a bad rap in terms of performance. Critics often move the goal posts mid-discussion, either unintentionally or simply to get a rise out of Python developers. Here's a typical exchange:

Critic: Python's not as fast as Java.
Fan: Yes, it is. Look at this benchmark.
Critic: That doesn't count because it's single-threaded.
Fan: OK, look at this multithreaded benchmark.
Critic: There aren't enough CPU cores in that benchmark.
Fan: Fine. Look at this benchmark.
Critic: That doesn't count because it uses a C library.
Fan: So what?

This back-and-forth ignores or downplays real issues that determine how to solve problems and get the job done.

The fact is, Python is a great choice for noncomputer scientists who need to program for tasks in many mathematical and scientific areas. A wide variety of well-tested Python libraries are designed to achieve performance that rivals other programming languages.

Python vs. Java performance: Apples to oranges

Python is a dynamically typed language. It's often compared to Java, which is a statically typed language.

Dynamic typing means type checking in Python is done frequently as the program executes. Java does all the type checking at compile time. If there is any significant amount of type checking, it's unreasonable to compare Python to a statically typed language. Intentionally, they do different things at runtime even when they implement the same algorithm.

Diagram showing how Java and Python compile, type check and execute code.
Figure 1: Dynamically typed languages such as Python perform type checking at runtime, which can slow performance compared to statically typed languages such as Java.

At the same time, a Python programmer's claim that there is no performance penalty is nonsense. That's only true in cases where type checking has been minimized, which is not a typical real-world scenario.

A Python programmer should focus on the advantages of the dynamically typed language and why they feel the runtime performance cost is worth it. Some examples include the following:

  • Faster development.
  • Less verbose code.
  • Easier to learn.
  • Easier to consume data.

The importance -- and truth -- of all of these is open to debate. Proponents of statically typed languages can present their own list of arguments why their language is better. Python developers should focus on these differences, rather than execution speed, for a more apples-to-apples comparison.

Noncomputer scientists find dynamically typed languages easier to learn and work with than statically typed languages. As long as performance is adequate, they should choose a dynamically typed language as a tool to help with a job. Python is the winner for almost anyone who doesn't program, versus statically typed languages such as Java, C, C++ or C#.

If there is doubt about the power of dynamically typed languages, ask ChatGPT two questions:

  • How do I read a csv file in Java?
  • How do I read a csv file in Python?

Compare the explanations, and it's clear: Python is friendlier to nonprogrammers and beginners.

Multithreading and Python's global interpreter lock

The global interpreter lock (GIL) in Python enables the interpreter to easily and safely manage memory, or Python objects.

However, the GIL causes performance problems for certain types of applications, because it only allows one thread to execute Python code at a time. Figure 2 illustrates the problem with the GIL in an application with two threads that run on a CPU with two cores.

Diagram showing how Python's global interpreter lock prevents use of multiple threads and multicore CPUs.
Figure 2: Python's global interpreter lock introduces performance problems when dealing with multiple threads and multicore CPUs.

The practical result of the GIL is that Python only uses half the CPU with two cores, and a quarter of the processing power for CPUs with four cores. Modern CPUs often have many more cores, which exacerbates the problem.

Not all Python interpreters rely on a GIL, but the most popular Python interpreter, CPython, does. Many people argued to remove the GIL in Python 3.0 because there would be other breaking changes anyway. Here's what Python creator Guido van Rossum said back in 2007:

I'd welcome a set of patches into Py3k only if the performance for a single-threaded program (and for a multi-threaded but I/O-bound program) does not decrease.

Replacing the GIL with other mechanisms to handle referencing counting, garbage collection and memory management tasks will slow down single-threaded CPU-bound programs. So Python still has the GIL, although there is an effort, PEP 703, to allow disabling it.

Overcoming the GIL to improve Python performance

There are a couple of ways Python programmers can work around the GIL problem.

One way is to run multiple processes instead of multiple threads. This allows multiple cores to execute Python code at the same time. However, there is usually overhead with redundant data loading and iinterprocess communication. Moreover, this can add to the complexity of the application which undoes some of the advantage of a dynamically typed language.

Diagram showing how to use C libraries with Python for threads on multiple cores.
Figure 3: To avoid performance problems when using multiple cores, Python programs can use C libraries on threads for low-level numeric processing.

Another way Python programs can utilize multiple cores is to use C libraries on threads that are not limited by the GIL. This is where the criticism, "the real work is done in C," comes from.

Most application developers use libraries they didn't write, and don't know or care in what language the libraries are written. Python developers should be the same -- if a library helps do X better, then use it. The application-specific logic is still done in Python and easy for noncomputer scientists to understand.

Using a C library for low-level numeric processing is no different than using a Java library to implement low-level HTTP processing of an API server.

Something separate from the application code must perform the necessary, repeatable and difficult work, and the application code controls how that work is done. The Java HTTP library uses the JRE, which itself uses native methods written in C.

There are times when you must implement Python application code in C. It might not be ideal or easy, but it is a reasonable workaround for situations where the GIL is a bottleneck. Usually it can be a relatively simple implementation of an algorithm, while application complexity remains in Python code.

Python library options: NumPy and Pandas

There are many powerful Python C libraries that provide high performance for scientific applications that process large amounts of data in arrays or matrices. They work in C and therefore avoid the GIL thread limitation. NumPy and Pandas are two popular libraries.

NumPy creates and manages arrays, two-dimensional matrices and n-dimensional shapes. It can perform a vast amount of operations on the arrays, including almost everything a scientific application needs such as the following:

  • Basic arithmetic.
  • Logic functions.
  • Fast Fourier transforms.
  • Linear algebra.
  • Trigonometry.
  • Statistics.

Implementations are optimized, and multithreading can be used when appropriate. For large mathematical data sets, a Python application using NumPy performs as well as any other language.

Pandas, built on NumPy, provides higher-level data manipulation functionality and a tabular view of data similar to a SQL table or Excel sheet. Its functionality imports data from various sources, including SQL, JSON, Excel and CSV files.

It also uses matplotlib to generate many types of visualizations of the data such as bar, pie, line and histogram.

Together, Pandas and NumPy cover enough of the key programming requirements that Python developers need for large data processing functionality and performance that they could achieve in any other language.

While NumPy is very powerful and efficient, there are enhancements for specific classes of applications. In many cases, the modules that provide additional functionality are built on top of NumPy or use a NumPy-like interface to minimize the learning curve. Additional functionalities include the following:

  • Using GPUs.
  • Deep learning.
  • Dask arrays for parallelization.
  • Additional data types.

Python developers must correctly and precisely specify the data structure. One of the goals of dynamic typing in Python is that functions should work regardless of the type of data provided, as long as that data allows for the function.

Here's an actual example: Pandas read a CSV file and generated a column of values. IT successfully performed many floating-point operations on the column, but quietly copied the column many times to a new array of floating-point values. Telling Pandas the column type, and that the column data can be modified in place when data was loaded, resulted in a 100x performance improvement.

Python is a great choice for programmers, novice or pro, who need code to help with certain jobs such as in mathematical and science fields. Numerous and well-tested libraries have been optimized to work around performance issues and deliver results that rival what other programming languages achieve.

Dig Deeper on Core Java APIs and programming techniques

App Architecture
Software Quality
Cloud Computing