Python: Distributing a custom extension

This one is a follow up to the previous article, where we created a shared object written in C, which could be imported as a python module to invoke the underlying C functions.

This article is going to tackle the next challenge: how to distribute that shared onject so that it can be imported without having to keep the .so file in the working directory of every python script that wants to use it. Solving this would eliminate the need to copy-paste the same .so files everwhere.

Chuck it in /usr/lib?

Since we’ve just created a shared object file, it might be intuitive to just put it in /usr/lib and let linux take it from there, but this won’t actually work, because python will not make use of the dynamic linker to load these files in, and files places into /usr/lib and /usr/local/lib won’t be considered at all.

Normally, if you were working with C, the dynamic linker ld-linux.so would take care of the linking of things as long as your custom shared objects are placed in a standard location like /usr/lib, /usr/local/lib or whatever $LD_LIBRARY_PATH is set to. Then the function dlopen() would be used to call into ld-linux.so.

When working from Python, we do not have access to a function like dlopen() in C, so chucking the .so file in /usr/lib won’t make it available from any python script.

So what does python do?

The answer to this question really depends on your distro, so the answer here will be debian specific.

Quick recap on python installations on debian

Normally, python is already pre-installed on debian systems, and this python is managed by the apt package manager.

The executable would be located in /usr/bin.

Otherwise, if python was complied from source, the executable would most likely be compiled into /usr/local/bin.

If you have the same program (in this case, python) installed into both directories, you’d use the $PATH environment variable to pick which one should be the default.

I’m working in a throwaway VM, so I can just use the python installation that comes with the OS. In this case, the python installation will have a dist packages directory at /usr/lib/python{major version}/dist-packages.

distro-packages vs site-packages

.so extensions for python are installed into this directory when they come from the debian apt package manager.

Unfortunately this is just a debian specific convention: other distros will instead have a directory such as:

/usr/lib/python{minor version}/site-packages

In debian, this site-packages directory doesn’t exist, and it is merged into the distro-packages directory.

$PYTHONPATH

Managing different Python installations or multiple distributions can quickly become messy. Fortunately, there’s an environment variable, $PYTHONPATH, that works much like $PATH but tells Python where to look for additional modules/extensions, such as the shared object we created earlier.

By default, $PYTHONPATH is empty (python relies on its built-in search paths), so we can safely set it to anything, and pick any directory we like for our custom shared objects. This should be “good enough” for testing purposes.

e.g.

export PYTHONPATH=/opt/python-extensions

into .bashrc, and now python will be able to find any custom shared object placed into /opt/python-extensions.

If a python script needs to run from cron, the environment variable can be set before execution, e.g.

00 02 * * 1-5 PYTHONPATH=/opt/python-extensions python3 /path/to/python-script.py

Conclusion

$PYTHONPATH is a simple and effective solution for testing purposes, but in a real production environment, it might be safer to rely on a configuration management tool like ansible to deliver the shared object files into the directory that is assumed default for the specific distro.