Flavijus Piliponis â stock.ado

Tip

The dangers of Python import and how enterprises can be safe

The Python import statement carries a security risk that developers and enterprises need to watch out for. Here's how it works and why there's no easy fix.

Walker Aldridge, Lairds Computer Services

Published: 21 Jul 2023

There's a not-so-hidden danger when using Python that you need to be prepared to deal with.

All modern software development languages are modular, which means developers can break larger sections of code into smaller more manageable pieces. This lets them reuse units of code, typically grouped into libraries. These libraries are often not written in-house, but are open source collections created to perform common tasks, such as graphing, database connectivity or array calculations.

For units of code to work with other code units' methods and properties, they must reference those components. Modern languages, including Java and Python, implement this requirement in the form of an import statement. This forms the backbone of all modern enterprise code.

However, in Python, there is a danger associated with using the import statement. Let's explore why this is, and come to grips with the problem that has no easy answer.

What is the problem?

In most modern languages, to import a file, you must either create an instance of the library to use or call a static method directly. Either way, you do something in code to execute anything within the import.

Python is the exception to this rule. When you import a file with Python, it immediately executes any top-level code that is not a method.

Many developers use this Python import feature to their advantage. For example, they write code into the top-level body of a Python file to test their code or validate any assumptions that the library holds for dependencies.

Python is fundamentally a scripting language, and the ability to write and run logic directly from the body of a file is a compelling feature. However, this same feature, by its nature, executes code when nothing has been explicitly called to execute -- and that's potentially a big problem.

What is the danger?

Python's ability to execute code at import presents three levels of risk to an enterprise: accidental risk, deliberate risk and external risk.

Accidental risk is when a developer modifies or adds something to the top-level body of an import, possibly to test something, and forgets to remove that specific code after it's done. This could use unnecessary resources, populate log files with nonsensical error messages, or run tests that contain infinite loops or similar issues. These issues would be hard to track down as the code itself is not explicitly called anywhere.

Deliberate risk is where a disgruntled developer modifies a top-level body of an import with malicious intent. This would enable them to, for example, run cryptomining routines, export confidential information outside of the network, install a backdoor Trojan into the network or even simply crash the code at an intentionally hard-to-find spot.

External risk is perhaps the most insidious risk. Here, none of the developers do anything wrong, accidental or otherwise. Instead, a malicious third party manages to compromise an external library that the enterprise uses. Malicious actors could then install backdoors into the company's network to compromise data, mine cryptocurrency or launch a ransomware attack.

Such poisoned libraries are increasingly used as an attack vector against companies. In languages other than Python, they still rely on code to be executed in order to work at all.

In Python, all they need is an import action.

How does Python address this?

To address this problem, Python requires that any code that a developer wants to run is held within a special section of code that only executes when the file is run as an application and not initially at import.

This is implemented by using an if statement to check that __name__ is __main__. Here is a simple example of how this works:

def some_method():
 return "Hello, world!"

print("This will always be printed")

if __name__ == "__main__":
 print("This will only be printed if this is run directly")

If the file was imported, it prints, "This will always be printed." If it is executed directly, then it prints, "This will always be printed," and also, "This will only be printed if this is run directly."

The problem with this approach is that it depends entirely on honesty. In a normal development environment, it works well enough, with the only worry being accidents. It provides no defense whatsoever against malicious actors, whether external or internal.

Why is this not a problem in other languages?

Other modern languages, such as Java or C#, by design, never automatically execute code on import. A developer must either initialize a class from the import or call a static method within the import.

This raises the barrier significantly for malicious code to exploit the system.

Is the problem overstated?

The severity of the problem depends largely upon an enterprise's internal practices, such as unit tests and source control. Unfortunately, such standard practices offer little help to mitigate a malicious actor.

Consider a simple payload that opens a shell and connects to it a port that allows remote access. Then, the payload finds a second-level import and adds the code to the bottom of the file. Or, better still, it places the code between methods.

That's all it takes to exploit the Python import statement problem.

The code would not be large, and it would take minimal resources. It would be simple to hide that in a large commit of code, and it would have no impact on unit tests. As far as everyone is concerned, everything would be normal -- until the shell is accessed and malicious code executes.

At the enterprise level, this problem is not overstated.

Can you mitigate Python import danger?

Python's import functionality that automatically executes top-level code is a real issue for its enterprise use.

Organizations should proactively include static code analysis tools that flag the existence of top-level code. They also should employ AI-based routines that recognize when seldom-used libraries are surreptitiously edited.

These steps only mitigate the risk, however; they do not eliminate it. This Python import danger is an attack vector that every maintainer of a Python-centric codebase should bear in mind.

The dangers of Python import and how enterprises can be safe

The Python import statement carries a security risk that developers and enterprises need to watch out for. Here's how it works and why there's no easy fix.

What is the problem?

What is the danger?

How does Python address this?

Why is this not a problem in other languages?

Is the problem overstated?

Can you mitigate Python import danger?

Dig Deeper on Core Java APIs and programming techniques

Asynchronous programming in Python tutorial

What is a script?

Difference between a statement vs. expression in programming

What is NumPy? Explaining how it works in Python