Getty Images/iStockphoto

9 tips to improve Python performance

Python performance gets a bad rap compared with languages such as Java. Use these tips to identify and fix problems in your Python code to tweak its performance.

Alfred "Fred" Christianson, Developer Relief LLC

Published: 04 Oct 2023

Optimized apps and websites start with well-built code. The truth, however, is that you don't need to worry about performance in 90% of your code, and probably 100% for many scripts. It doesn't matter if an ETL script takes one second or one minute if it only runs once or nightly.

It does matter, though, if a user is forced to wait for a sluggish app to complete a task or a webpage to show results. Even then, it's likely only a small portion of the codebase is to blame.

The biggest performance wins usually come from planning for performance before coding even begins, not after slow performance occurs. That said, there are many ways that app developers can address code performance issues.

The following nine tips specifically target Python performance, although several could apply to other languages as well:

Select correct data types.
Know standard functions, methods and libraries.
Find performance-focused libraries.
Understand the different comprehensions.
Use generator functions, patterns and expressions.
Consider how to process large data.
Run profiles to identify problematic code.
Consider CPython alternatives.
Focus on meaningful improvements.

Select correct data types

Use the best data type for collections. It's easy to create a list wherever there is a collection. You can use a list almost anywhere instead of a set or tuple, and lists can do more than those.

However, some operations are faster with a set or tuple, and both types typically use less memory than lists. To select the best type to use for a collection, you must first understand the data with which you are working and the operations you want to perform.

Using timeit we can see that testing membership with a set can be significantly faster than with a list:

> python testtimeit_data_type.py
Execution time(with list): 7.966896300087683 seconds
Execution time(with set): 4.913181399926543 seconds

Sometimes it's faster to create a temporary set or tuple from a list. For example, to find common values in two lists, it might be faster to create two sets and use set intersection(). It depends on the data length and operations, so it's best to test with your expected data and operations.

Know standard functions and methods and libraries

Understand Python's standard functionality. Modules are optimized and are almost always faster than your written code. Many Python functions are rarely needed, and it's easy to forget they exist. You might remember set intersection(), but will you remember difference() or isdisjoint() if you need them?

Scan Python documentation occasionally, or when you have a performance problem, to know what functionality is available. When you start a new project, carefully read those sections that are relevant to your own work.

If you only have time to study one module, make it itertools -- and consider installing more-itertools (which is not in the standard libraries) if you think you can use its functionality. Some of the itertools functionality might not look useful at first, but don't be surprised if you work on something and remember what you saw in intertools that could help. It's good to know what's available even if you don't see an immediate use.

Find performance-focused libraries

If you're working on something big, there's a good chance someone already created a well-performing library that can help. You might need multiple libraries to provide functionality for different areas of your project, which could include some or all of the following:

Scientific computing.
Vision.
Machine learning.
Reporting.
Integrations.

Pay attention to release dates, documentation, support and community. Older libraries may no longer perform as well as they once did, and you may require help from someone familiar with a given library to achieve the desired performance.

Multiple libraries often provide the same sets of functionality. To determine which one to select, create a quick test for each using data that's realistic for your needs. You might find one is much easier to use, or another provides better out-of-the-box performance.

Pandas

Pandas is a common data analysis library for beginners. It's worth learning for two reasons: Other libraries use it or provide compatible interfaces, and you can find the most help and examples for it.

Polars

Alternatives to pandas, such as Polars, provide better performance for many operations. Polars has a different syntax and might present a learning curve, but is worth a look to start a new large data project or address performance problems in an existing project.

Dask

If you regularly work with very large data, you should also think about parallel computing. This might require changes to how you organize data and perform calculations. It also requires a lot of effort to program correctly for multiple cores or multiple machines, an area in which Python lags compared with other languages.

Dask -- and other libraries, such as Polars -- handle the complexity to get the most out of your cores or machines. If you are familiar with NumPy or pandas, Dask provides a minimal learning curve to parallelize across cores. Even if you don't use Dask, it's helpful to understand the functionality it provides and how to use it can help you prepare to work with large data.

Understand the different comprehensions

This is a common Python performance tip: List comprehension will be faster than for loops.

My test resulted in these timings, which are impressive.

>python timeit_comprehension.py
Execution time (with for): 5.180072800023481 seconds
Execution time (with list comprehension): 2.5427665999159217 seconds

But all I did was create a new list with values calculated from the original, as tips often show the following:

new_list = [val*2 for val in orig_list]

The relevant question is this: What will I do with the list? Any real-world performance gain from list comprehension probably comes from using the optional predicate (filter), or nested comprehension, or generator expressions instead of list comprehensions.

This example flattens a matrix with a nested comprehension, which typically outperforms nested loops:

flattened = [x for row in matrix for x in row]

This one uses a nested comprehension and a filter:

    names = [
        employee.name
        for manager in managers
        for employee in employees
            if employee.manager_id == manager.id

    ]

References to list comprehension are most common, but set comprehension, dictionary comprehension, and generator expressions work the same way. Choose the correct one for the data type you want to create.

The following example is similar to the matrix example above, but returns a set instead of list, which is an easy way to get unique values from a list.

unique = {x for row in matrix for x in row}

The only difference between list comprehension and set comprehension is that set comprehension uses curly brackets {} instead of square brackets [].

Use generator functions, patterns and expressions

A generator is a good way to reduce memory while iterating over a large collection. If you work with large collections, you should know how to write generator functions and use the generator pattern for an iterable.

Also, learn how to use generator expressions, which are like list comprehensions. The following code sums the square of all values in a matrix without creating a list of the squares.

total = sum(x**2 for row in matrix for x in row)

Consider how to process large data

Performance with large data and large files deserves a complete and separate discussion. That said, there are a few things to think about before you start a big data project. Be prepared to make the following decisions:

Pick a data library.
Process data in chunks.
Determine if you can ignore some data.
Specify the data type.
Use different file types.

Data libraries

For very large data, you will almost certainly use specialized libraries, such as NumPy for scientific calculations, pandas for data analysis, or Dask for parallelization and distributed computing. Search the internet and you'll likely find a library for any other large data needs, as well as alternatives for these.

Chunk data

If your data is too big to fit in RAM, you probably need to process it in chunks. This ability is built into Pandas, like so:

chunk_readers = pandas.read_csv("./data.csv", chunksize=2000)
for chunk in chunk_readers:
    for index, record in chunk.iterrows():
            pprint(record)

Other modules such as Dask and Polars have their own methods to chunk or partition data.

Ignore data

Data files often have much more data than you need. Pandas' read_csv has an argument, usecols, which lets you specify the columns you need and it ignores the rest:

# only keep named columns
data_frame = pandas.read_csv("./sales.csv", usecols=["product_id", "zip_code"])

# only keep columns at indexes 1,8, and 20
data_frame = pandas.read_csv("./population.csv", usecols=[1,8,20])

This can significantly reduce the memory required to process the data. A .csv file might be too large for RAM, but if you only load the columns you need you might avoid chunking.

Comprehensions are another way to remove columns and rows from a table so that you work with only the data you need. For this to work, the entire file must be read or chunked and both the original and comprehension containers must exist at the same time. It's better to remove rows as you iterate through chunks if there is a significant number of rows to ignore.

Specify data types

Another way to save memory is to specify the type of data once and use the smallest type needed. This can also improve speed, as it keeps data in the format that is fastest for your calculations and eliminates the need to convert data each time it is used. For example, a person's age fits in eight bits (0-255) so we can tell pandas to use int8 instead of int64 for that column:

data_frame = pandas.read_csv("./people.csv", dtype={"age": "int8"})

It is usually best to set the data type during load, but sometimes that is not possible -- for example, pandas fails to convert a non-integer float, such as 18.5 to an int8. It can convert the entire column after the data frame is loaded. Pandas has numerous ways to replace or modify columns and handle errors in the data:

data_frame['age'] =
            data_frame['age'].astype('int8', errors='ignore')

Dataframe astype and pandas.to_numeric can perform different types of conversions. If those won't work for your data, it may be necessary to implement your own conversion in a loop or comprehension.

Use different file types

Most cases require you to work with common file types such as .csv. If you need to save intermediate files during your process, it can be helpful to use other formats.

Apache Arrow provides Python bindings with the PyArrow module. It integrates with NumPy, pandas and Python objects, and provides ways to read and write data sets in additional file formats. These formats are smaller, and PyArrow can read and write them faster than Python .csv functions. PyArrow also has additional functionality for data analysis, which you may not require now but at least you know it's available in case of future needs.

As stated earlier, Pandas is a popular data analysis library. Polars supports multiple file formats, multiple cores and larger-than-memory data files. Dask partitioning handles chunking as previously mentioned for larger-than-memory data files, and Dask best practices use a more efficient file format in Parquet.

Run profiles to identify problematic code

When you have performance problems, it's best to profile your code rather than guess at where to focus optimization efforts. A general approach to improve performance includes the following steps:

Create a minimal, reproducible use case that is slower than desired.
Run a profile.
Improve the functions with the highest percall values.
Repeat step 2 until you achieve the desired level of performance.
Run a real use case (not minimal) without profiling.
Repeat step 1 until you achieve the desired level of performance in step 5.

With experience you'll come to understand, or at least get a feel for, the best place to focus. It may not be the code with the highest percall value. Sometimes the fix is to modify a single calculation inside a loop. Other times you might need to eliminate loops or perform calculations/comprehensions outside of the loop. Maybe the answer is to reorganize the data, or use Dask for parallelization.

Consider CPython alternatives

The implementation of Python most commonly used is CPython. All common libraries work with CPython and it implements the language spec completely.

Other implementations exist, but can cause problems including the following:

Python code might act differently in edge cases.
C API modules might not work or work significantly slower.
Additional features won't work if you must run CPython in the future.

With those caveats, there are still cases where an alternate implementation might be best. Performance will almost never be better if calculations heavily depend on a C API module such as NumPy. Consider alternatives to CPython, such as the following:

Jython. Runs in a JVM and allows easy integration with Java classes and objects.
IronPython. Designed for .NET and enables easy integration with the .NET ecosystem and C# libraries.
PyPy. Uses a just-in-time (JIT) compiler and runs significantly faster than CPython unless the C API is involved.

These and other CPython alternatives are worth consideration if you have a very limited need of common modules, especially C API modules. Jython and IronPython are good choices if your application has significant dependencies on existing Java or .NET functionality.

Focus on meaningful improvements

There are other commonly suggested tips and tricks to improve Python performance, such as the following:

Avoid dot notation, including Math.sqrt() or myObj.foo().
Use string manipulation, such as the join() method.
Employ multiple assignments, for example a,b = 1,2.
Use the @functools.lru_cache decorator.

If you create timeit tests you'll see a huge improvement when ineffective practices are compared to ideal ones. (By the way, you should learn how to use timeit.) However, these tips only make a noticeable difference in your program if both of the following conditions apply:

Coding was already done the "wrong way," such as within a large loop or comprehension.
These operations take a significant amount of time with each iteration.

If each iteration of a loop takes a single second, you probably won't notice a tiny fraction of a second saved by performing two assignments in one statement.

Consider where your Python optimization efforts matter the most. You might get more value if you focus on readability and consistency rather than performance. If your project standards are to combine assignments or eliminate dots, then do it. Otherwise, stick with the standards until you run performance tests (timeit or profile) and find a problem.