Improve Python Performance with Cython

Improve Python performance with Cython

Introduction

According to the TIOBE index (https://www.tiobe.com/tiobe-index/), Python is now the most popular programming language. Easy to learn, a lot of library support. Although Python is very popular, it does not excel in performance. We will discuss a couple of reasons why Python program execution is slower than languages such as C or C++. This blogpost show how you can improve Python performance with Cython. Cython will boost your Python program and speedup execution time more than 30 times. Improving performance has never been so easy.

Low Python performance

Potential performance problems

· Dynamic Typing: Python is a dynamically typed language, which means that the interpreter determines variable types at runtime. This flexibility comes at a cost, as the interpreter performs type checking and handle dynamic type conversions during execution.

· Interpretation vs. Compilation: Python is an interpreted language, which means that Python code is executed line by line by the Python interpreter. The interpretation process introduces overhead if you compare this to languages like C and C++. The programs are compiled into machine code and then executed.

· Memory Management: Python uses automatic memory management through a garbage collector, which adds overhead for memory allocation and deallocation. In C and C++, developers have more direct control over memory management, allowing them to allocate and deallocate memory manually, which results in more efficient memory usage.

· Global Interpreter Lock (GIL): In CPython, the default and most widely used implementation of Python, the Global Interpreter Lock (GIL) is a mechanism that allows only one thread to execute Python bytecode at a time. C and C++ have multi-threading support.

· Optimization Opportunities: Low-level languages like C and C++ provide more opportunities for manual optimization, such as loop unrolling, inline assembly, and fine-tuning memory access patterns.

Ideal world

In an ideal world, we would like to use Python syntax with the execution speed of C or C++. Cython offers us Python at the speed of C. This article explains what Cython is and how you can use it existing Python programs. Cython offers language interoperability which is enabled using C and C++ libraries in Python programs. This article focuses on performance improvement in Python with Cython.

What is Cython?

Cython is a superset of the Python programming language that allows developers to write Python code with C-extensions.

You compile Cython code to C. The Cython compiler creates a wrapper around the C code for Python. This will generate an extension module. We can use this extension module in Python code using the import statement. This results in significant performance improvement at runtime. The extension modules are compiled into shared libraries, which can be imported into the Python source code.

Cython has to possibility to wrap independent C or C++ code into python-importable modules as shared libraries. It has native support for most of the C++ language. The Cython language is a superset of the Python language. It supports calling C functions and declaring C datatypes on variables and class attributes.

Cython is designed as a C-extension for Python. Developers can use Cython to speed up Python code execution. To use Cython, you have to install Python and C-compiler.

Two variants of Cython for Performance Improvement

Cython uses two syntax variants: Pure Python mode or Cython mode.

1.     Pure Python: Pure Python syntax allows static Cython-type declarations in Pure Python code. Pure Python is written and stored in a *.py file.

2.     Cython: The Cython cdef syntax makes type declarations easily readable from a C/C++ perspective. Cython code is stored in a *.pyx file.

In Pure Python we declare an integer as:

i: cython.int

In Cython we declare an integer as:

cdef int i

Cython uses the standard C type definition.

In the Cython git repository on github you can find additional data types.

https://github.com/cython/cython/blob/master/Cython/Includes/libc/stdint.pxd

Cython Setup

Python distributions such as Anaconda, Canopy and Sage pre-install Cython. The easiest way to install Cython is with pip.

$pip install cython

To build the shared libraries, you must install a C/C++ compiler. For this article work on a windows system with Microsoft Visual studio. Check that the C/C++ compiler is be in the system PATH. The Cython compiler uses the installed C compiler to build the library files.

An example with prime numbers

https://github.com/enjoy-to-code/prime

First, we start with the implementation in Python. Here is the primes_python implementation (prime.py)

def primes_python(nb_primes):
    p = []
    n = 2
    while len(p) < nb_primes: 
        for i in p:
            if n % i == 0:
                break

        else:
            p.append(n)
        n += 1
    return p

This function creates an array (p) with prime numbers from 0 to a value of nb_primes. The function returns an array of prime numbers once it will reach the value of nb_primes .

Improve performace with Pure Python

The next step is to implement the function in Pure Python. The only thing we do is add datatype information to the function variables. This is how the function looks in Pure Python:

def primes_pure_python(nb_primes: cython.int):
    i: cython.int
    p: cython.int[2500]

    if nb_primes > 2500:
        nb_primes = 2500

    if not cython.compiled:  
        p = [0] * 2500       

    len_p: cython.int = 0  
    n: cython.int = 2

    while len_p < nb_primes:
        for i in p[:len_p]:
            if n % i == 0:
                break
        else:
            p[len_p] = n
            len_p += 1
        n += 1

    result_as_list = [prime for prime in p[:len_p]]
    return result_as_list

We import the Cython library and use the Pure Python style to declare the local variables i, p, len_p and n. For this example, we use an array with a maximum size of 2500 integers.

Improve Python performance with Cython

Improving Python performance with Cython. The next version of the function is the Cython implementation:

def primes_cython(int nb_primes):
    cdef int n, i, len_p
    cdef int[2500] p

    if nb_primes > 2500:
        nb_primes = 2500

    len_p = 0  

    n = 2
    while len_p < nb_primes:
        for i in p[:len_p]:
            if n % i == 0:
                break
        else:
            p[len_p] = n
            len_p += 1
        n += 1

    result_as_list = [prime for prime in p[:len_p]]
    return result_as_list

Save the Cython file as prime_cython.pyx. The Cython compiler compiles this file and creates a primes_cython. The pyd file is the shared library the we must import into the primes.py file.

We must create a setup.py file to compile the prime_cython.pyx file. Here is the setup.py file:

from Cython.Build import cythonize
from setuptools import setup

setup(
    ext_modules=cythonize(
        "primes_cython.pyx", annotate=True
    ),
)

Create this file and save it as setup.py. Next step is to build the primes_cython library.

$python setup.py build_ext –inplace

We use the ‘annotate=True’ option, the Cython complier will create an annotated HTML file (primes_cython.html). This file shows where Python uses the generated C-code. The compiler generates a C file in the same directory. In this example the filename is, primes_cython.c. Only for this small function does the file contain 3071 lines of code.

The shared library has the *.pyd file extension. The primes.py file refers to the library with the import statement:

import prime_cython

It is possible to compile the Pure Python library with the Cython compiler. Save the Pure Python version of the files as primes_pure_python.py. Create a new setup.py file and compile the primes_pure_python.py file. This will create the library.

Performance measurements

Here is the main program that executes all three versions of the prime function:MAX = 2_500

print("Python version")
start_time = datetime.now()
primes_python(MAX)
end_time = datetime.now()
print('Duration in Python: {}'.format(end_time - start_time))

print("Pure Python version")
start_time = datetime.now()
primes_pure_python.primes_pure_python(MAX)
end_time = datetime.now()
print('Duration in Pure Python: {}'.format(end_time - start_time))

print("Cython version")
start_time = datetime.now()
primes_cython.primes_cython(MAX)
end_time = datetime.now()
print('Duration in Cython: {}'.format(end_time - start_time))

We run the program on an Intel i7 processor with 16 GB memory, the program output is as follows:

Python version

Duration in Python: 0:00:00.139008

Pure Python version

Duration in Pure Python: 0:00:00.006001

Cython version

Duration in Cython: 0:00:00.004513

The Pure Python version outperforms the standard interpreted Python version 23 times. The Cython version surpasses the original version 30 times.

Conclusion

Implementing as many computational tasks as possible in the Cython layer is a good idea. It is easy to change existing programs to Cython code.

The complete source code can be downloaded from my GitHub page

https://github.com/enjoy-to-code/prime

More information about on how to improve Python performance with Cython can be found in the book “Modern Multiparadigm Software Architectures and Design Patterns with Examples and Applications in C++, C# and Python”, which Daniel Duffy and I will publish in 2024.