Python

Python packages and modules have their own special build system. This documentation covers everything you’ll need to know in order to write a Spack build recipe for a Python library.

Terminology

In the Python ecosystem, there are a number of terms that are important to understand.

PyPI

The Python Package Index, where most Python libraries are hosted.

sdist

Source distributions, distributed as tarballs (.tar.gz) and zip files (.zip). Contain the source code of the package.

bdist

Built distributions, distributed as wheels (.whl). Contain the pre-built library.

wheel

A binary distribution format common in the Python ecosystem. This file is actually just a zip file containing specific metadata and code. See the documentation for more details.

build frontend

Command-line tools used to build and install wheels. Examples include pip, build, and installer.

build backend

Libraries used to define how to build a wheel. Examples include setuptools, flit, poetry, hatchling, meson, and pdm.

Downloading

The first step in packaging a Python library is to figure out where to download it from. The vast majority of Python packages are hosted on PyPI, which is preferred over GitHub for downloading packages. Search for the package name on PyPI to find the project page. The project page is usually located at:

https://pypi.org/project/<package-name>

On the project page, there is a “Download files” tab containing download URLs. Whenever possible, we prefer to build Spack packages from source. If PyPI only has wheels, check to see if the project is hosted on GitHub and see if GitHub has source distributions. The project page usually has a “Homepage” and/or “Source code” link for this. If the project is closed-source, it may only have wheels available. For example, py-azureml-sdk is closed-source and can be downloaded from:

https://pypi.io/packages/py3/a/azureml_sdk/azureml_sdk-1.11.0-py3-none-any.whl

Once you’ve found a URL to download the package from, run:

$ spack create <url>

to create a new package template.

PyPI vs. GitHub

Many packages are hosted on PyPI, but are developed on GitHub or another version control system hosting service. The source code can be downloaded from either location, but PyPI is preferred for the following reasons:

  1. PyPI contains the bare minimum number of files needed to install the package.

    You may notice that the tarball you download from PyPI does not have the same checksum as the tarball you download from GitHub. When a developer uploads a new release to PyPI, it doesn’t contain every file in the repository, only the files necessary to install the package. PyPI tarballs are therefore smaller.

  2. PyPI is the official source for package managers like pip.

    Let’s be honest, pip is much more popular than Spack. If the GitHub tarball contains a file not present in the PyPI tarball that causes a bug, the developers may not realize this for quite some time. If the bug was in a file contained in the PyPI tarball, users would notice the bug much more quickly.

  3. GitHub release may be a beta version.

    When a developer releases a new version of a package on GitHub, it may not be intended for most users. Until that release also makes its way to PyPI, it should be assumed that the release is not yet ready for general use.

  4. The checksum for a GitHub release may change.

    Unfortunately, some developers have a habit of patching releases without incrementing the version number. This results in a change in tarball checksum. Package managers like Spack that use checksums to verify the integrity of a download tarball grind to a halt when the checksum for a known version changes. Most of the time, the change is intentional, and contains a needed bug fix. However, sometimes the change indicates a download source that has been compromised, and a tarball that contains a virus. If this happens, you must contact the developers to determine which is the case. PyPI is nice because it makes it physically impossible to re-release the same version of a package with a different checksum.

The only reason to use GitHub instead of PyPI is if PyPI only has wheels or if the PyPI sdist is missing a file needed to build the package. If this is the case, please add a comment above the url explaining this.

PyPI

Since PyPI is so commonly used to host Python libraries, the PythonPackage base class has a pypi attribute that can be set. Once set, pypi will be used to define the homepage, url, and list_url. For example, the following:

homepage = "https://pypi.org/project/setuptools/"
url      = "https://pypi.org/packages/source/s/setuptools/setuptools-49.2.0.zip"
list_url = "https://pypi.org/simple/setuptools/"

is equivalent to:

pypi = "setuptools/setuptools-49.2.0.zip"

If a package has a different homepage listed on PyPI, you can override it by setting your own homepage.

Description

The top of the PyPI project page contains a short description of the package. The “Project description” tab may also contain a longer description of the package. Either of these can be used to populate the package docstring.

Dependencies

Once you’ve determined the basic metadata for a package, the next step is to determine the build backend. PythonPackage uses pip to install the package, but pip requires a backend to actually build the package.

To determine the build backend, look for a pyproject.toml file. If there is no pyproject.toml file and only a setup.py or setup.cfg file, you can assume that the project uses setuptools. If there is a pyproject.toml file, see if it contains a [build-system] section. For example:

[build-system]
requires = [
    "setuptools>=42",
    "wheel",
]
build-backend = "setuptools.build_meta"

This section does two things: the requires key lists build dependencies of the project, and the build-backend key defines the build backend. All of these build dependencies should be added as dependencies to your package:

depends_on("py-setuptools@42:", type="build")

Note that py-wheel is already listed as a build dependency in the PythonPackage base class, so you don’t need to add it unless you need to specify a specific version requirement or change the dependency type.

See PEP 517 and PEP 518 for more information on the design of pyproject.toml.

Depending on which build backend a project uses, there are various places that run-time dependencies can be listed. Most modern build backends support listing dependencies directly in pyproject.toml. Look for dependencies under the following keys:

  • requires-python under [project]

    This specifies the version of Python that is required

  • dependencies under [project]

    These packages are required for building and installation. You can add them with type=("build", "run").

  • [project.optional-dependencies]

    This section includes keys with lists of optional dependencies needed to enable those features. You should add a variant that optionally adds these dependencies. This variant should be False by default.

Some build backends may have additional locations where dependencies can be found.

distutils

Before the introduction of setuptools and other build backends, Python packages had to rely on the built-in distutils library. Distutils is missing many of the features that setuptools and other build backends offer, and users are encouraged to use setuptools instead. In fact, distutils was deprecated in Python 3.10 and will be removed in Python 3.12. Because of this, pip actually replaces all imports of distutils with setuptools. If a package uses distutils, you should instead add a build dependency on setuptools. Check for a requirements.txt file that may list dependencies of the project.

setuptools

If the pyproject.toml lists setuptools.build_meta as a build-backend, or if the package has a setup.py that imports setuptools, or if the package has a setup.cfg file, then it uses setuptools to build. Setuptools is a replacement for the distutils library, and has almost the exact same API. In addition to pyproject.toml, dependencies can be listed in the setup.py or setup.cfg file. Look for the following arguments:

  • python_requires

    This specifies the version of Python that is required.

  • setup_requires

    These packages are usually only needed at build-time, so you can add them with type="build".

  • install_requires

    These packages are required for building and installation. You can add them with type=("build", "run").

  • extras_require

    These packages are optional dependencies that enable additional functionality. You should add a variant that optionally adds these dependencies. This variant should be False by default.

  • tests_require

    These are packages that are required to run the unit tests for the package. These dependencies can be specified using the type="test" dependency type. However, the PyPI tarballs rarely contain unit tests, so there is usually no reason to add these.

See https://setuptools.pypa.io/en/latest/userguide/dependency_management.html for more information on how setuptools handles dependency management. See PEP 440 for documentation on version specifiers in setuptools.

flit

There are actually two possible build-backend for flit, flit and flit_core. If you see these in the pyproject.toml, add a build dependency to your package. With flit, all dependencies are listed directly in the pyproject.toml file. Older versions of flit used to store this info in a flit.ini file, so check for this too.

In addition to the default pyproject.toml keys listed above, older versions of flit may use the following keys:

  • requires under [tool.flit.metadata]

    These packages are required for building and installation. You can add them with type=("build", "run").

  • [tool.flit.metadata.requires-extra]

    This section includes keys with lists of optional dependencies needed to enable those features. You should add a variant that optionally adds these dependencies. This variant should be False by default.

See https://flit.pypa.io/en/latest/pyproject_toml.html for more information.

poetry

Like flit, poetry also has two possible build-backend, poetry and poetry_core. If you see these in the pyproject.toml, add a build dependency to your package. With poetry, all dependencies are listed directly in the pyproject.toml file. Dependencies are listed in a [tool.poetry.dependencies] section, and use a custom syntax for specifying the version requirements. Note that ~= works differently in poetry than in setuptools and flit for versions that start with a zero.

hatchling

If the pyproject.toml lists hatchling.build as the build-backend, it uses the hatchling build system. Hatchling uses the default pyproject.toml keys to list dependencies.

See https://hatch.pypa.io/latest/config/dependency/ for more information.

meson

If the pyproject.toml lists mesonpy as the build-backend, it uses the meson build system. Meson uses the default pyproject.toml keys to list dependencies.

See https://meson-python.readthedocs.io/en/latest/tutorials/introduction.html for more information.

pdm

If the pyproject.toml lists pdm.pep517.api as the build-backend, it uses the PDM build system. PDM uses the default pyproject.toml keys to list dependencies.

See https://pdm.fming.dev/latest/ for more information.

wheels

Some Python packages are closed-source and are distributed as Python wheels. For example, py-azureml-sdk downloads a .whl file. This file is simply a zip file, and can be extracted using:

$ unzip *.whl

The zip file will not contain a setup.py, but it will contain a METADATA file which contains all the information you need to write a package.py build recipe. Check for lines like:

Requires-Python: >=3.5,<4
Requires-Dist: azureml-core (~=1.11.0)
Requires-Dist: azureml-dataset-runtime[fuse] (~=1.11.0)
Requires-Dist: azureml-train (~=1.11.0)
Requires-Dist: azureml-train-automl-client (~=1.11.0)
Requires-Dist: azureml-pipeline (~=1.11.0)
Provides-Extra: accel-models
Requires-Dist: azureml-accel-models (~=1.11.0); extra == 'accel-models'
Provides-Extra: automl
Requires-Dist: azureml-train-automl (~=1.11.0); extra == 'automl'

Requires-Python is equivalent to python_requires and Requires-Dist is equivalent to install_requires. Provides-Extra is used to name optional features (variants) and a Requires-Dist with extra == 'foo' will list any dependencies needed for that feature.

Passing arguments to setup.py

The default install phase should be sufficient to install most packages. However, the installation instructions for a package may suggest passing certain flags to the setup.py call. The PythonPackage class has two techniques for doing this.

Config settings

These settings are passed to PEP 517 build backends. For example, py-scipy package allows you to specify the name of the BLAS/LAPACK library you want pkg-config to search for:

depends_on("py-pip@22.1:", type="build")

def config_settings(self, spec, prefix):
    return {
        "blas": spec["blas"].libs.names[0],
        "lapack": spec["lapack"].libs.names[0],
    }

Note

This flag only works for packages that define a build-backend in pyproject.toml. Also, it is only supported by pip 22.1+, which requires Python 3.7+. For packages that still support Python 3.6 and older, install_options should be used instead.

Global options

These flags are added directly after setup.py when pip runs python setup.py install. For example, the py-pyyaml package has an optional dependency on libyaml that can be enabled like so:

def global_options(self, spec, prefix):
    options = []
    if spec.satisfies("+libyaml"):
        options.append("--with-libyaml")
    else:
        options.append("--without-libyaml")
    return options

Note

Direct invocation of setup.py is deprecated. This flag forces pip to use a deprecated installation procedure. It should only be used in packages that don’t define a build-backend in pyproject.toml or packages that still support Python 3.6 and older.

Install options

These flags are added directly after install when pip runs python setup.py install. For example, the py-pyyaml package allows you to specify the directories to search for libyaml:

def install_options(self, spec, prefix):
    options = []
    if spec.satisfies("+libyaml"):
        options.extend([
            spec["libyaml"].libs.search_flags,
            spec["libyaml"].headers.include_flags,
        ])
    return options

Note

Direct invocation of setup.py is deprecated. This flag forces pip to use a deprecated installation procedure. It should only be used in packages that don’t define a build-backend in pyproject.toml or packages that still support Python 3.6 and older.

Testing

PythonPackage provides a couple of options for testing packages both during and after the installation process.

Import tests

Just because a package successfully built does not mean that it built correctly. The most reliable test of whether or not the package was correctly installed is to attempt to import all of the modules that get installed. To get a list of modules, run the following command in the source directory:

$ python
>>> import setuptools
>>> setuptools.find_packages()
['numpy', 'numpy._build_utils', 'numpy.compat', 'numpy.core', 'numpy.distutils', 'numpy.doc', 'numpy.f2py', 'numpy.fft', 'numpy.lib', 'numpy.linalg', 'numpy.ma', 'numpy.matrixlib', 'numpy.polynomial', 'numpy.random', 'numpy.testing', 'numpy.core.code_generators', 'numpy.distutils.command', 'numpy.distutils.fcompiler']

Large, complex packages like numpy will return a long list of packages, while other packages like six will return an empty list. py-six installs a single six.py file. In Python packaging lingo, a “package” is a directory containing files like:

foo/__init__.py
foo/bar.py
foo/baz.py

whereas a “module” is a single Python file.

The PythonPackage base class automatically detects these package and module names for you. If, for whatever reason, the module names detected are wrong, you can provide the names yourself by overriding import_modules like so:

import_modules = ["six"]

Sometimes the list of module names to import depends on how the package was built. For example, the py-pyyaml package has a +libyaml variant that enables the build of a faster optimized version of the library. If the user chooses ~libyaml, only the yaml library will be importable. If the user chooses +libyaml, both the yaml and yaml.cyaml libraries will be available. This can be expressed like so:

@property
def import_modules(self):
    modules = ["yaml"]
    if self.spec.satisfies("+libyaml"):
        modules.append("yaml.cyaml")
    return modules

These tests often catch missing dependencies and non-RPATHed libraries. Make sure not to add modules/packages containing the word “test”, as these likely won’t end up in the installation directory, or may require test dependencies like pytest to be installed.

Instead of defining the import_modules explicitly, only the subset of module names to be skipped can be defined by using skip_modules. If a defined module has submodules, they are skipped as well, e.g., in case the plotting modules should be excluded from the automatically detected import_modules ["nilearn", "nilearn.surface", "nilearn.plotting", "nilearn.plotting.data"] set:

skip_modules = ["nilearn.plotting"]

This will set import_modules to ["nilearn", "nilearn.surface"]

Import tests can be run during the installation using spack install --test=root or at any time after the installation using spack test run.

Unit tests

The package may have its own unit or regression tests. Spack can run these tests during the installation by adding test methods after installation.

For example, py-numpy adds the following as a check to run after the install phase:

@run_after("install")
@on_package_attributes(run_tests=True)
def install_test(self):
    with working_dir("spack-test", create=True):
        python("-c", "import numpy; numpy.test('full', verbose=2)")

when testing is enabled during the installation (i.e., spack install --test=root).

Note

Additional information is available on install phase tests.

Setup file in a sub-directory

Many C/C++ libraries provide optional Python bindings in a subdirectory. To tell pip which directory to build from, you can override the build_directory attribute. For example, if a package provides Python bindings in a python directory, you can use:

build_directory = "python"

PythonPackage vs. packages that use Python

There are many packages that make use of Python, but packages that depend on Python are not necessarily PythonPackage’s.

Choosing a build system

First of all, you need to select a build system. spack create usually does this for you, but if for whatever reason you need to do this manually, choose PythonPackage if and only if the package contains one of the following files:

  • pyproject.toml

  • setup.py

  • setup.cfg

Choosing a package name

Selecting the appropriate package name is a little more complicated than choosing the build system. By default, spack create will prepend py- to the beginning of the package name if it detects that the package uses the PythonPackage build system. However, there are occasionally packages that use PythonPackage that shouldn’t start with py-. For example:

  • awscli

  • aws-parallelcluster

  • busco

  • easybuild

  • httpie

  • mercurial

  • scons

  • snakemake

The thing these packages have in common is that they are command-line tools that just so happen to be written in Python. Someone who wants to install mercurial with Spack isn’t going to realize that it is written in Python, and they certainly aren’t going to assume the package is called py-mercurial. For this reason, we manually renamed the package to mercurial.

Likewise, there are occasionally packages that don’t use the PythonPackage build system but should still be prepended with py-. For example:

  • py-genders

  • py-py2cairo

  • py-pygobject

  • py-pygtk

  • py-pyqt

  • py-pyserial

  • py-sip

  • py-xpyb

These packages are primarily used as Python libraries, not as command-line tools. You may see C/C++ packages that have optional Python language-bindings, such as:

  • antlr

  • cantera

  • conduit

  • pagmo

  • vtk

Don’t prepend these kind of packages with py-. When in doubt, think about how this package will be used. Is it primarily a Python library that will be imported in other Python scripts? Or is it a command-line tool, or C/C++/Fortran program with optional Python modules? The former should be prepended with py-, while the latter should not.

extends vs. depends_on

As mentioned in the Packaging Guide, extends and depends_on are very similar, but extends ensures that the extension and extendee share the same prefix in views. This allows the user to import a Python module without having to add that module to PYTHONPATH.

Additionally, extends("python") adds a dependency on the package python-venv. This improves isolation from the system, whether it’s during the build or at runtime: user and system site packages cannot accidentally be used by any package that extends("python").

As a rule of thumb: if a package does not install any Python modules of its own, and merely puts a Python script in the bin directory, then there is no need for extends. If the package installs modules in the site-packages directory, it requires extends.

Executing python during the build

Whenever you need to execute a Python command or pass the path of the Python interpreter to the build system, it is best to use the global variable python directly. For example:

@run_before("install")
def recythonize(self):
    python("setup.py", "clean")  # use the `python` global

As mentioned in the previous section, extends("python") adds an automatic dependency on python-venv, which is a virtual environment that guarantees build isolation. The python global always refers to the correct Python interpreter, whether the package uses extends("python") or depends_on("python").

Alternatives to Spack

PyPI has hundreds of thousands of packages that are not yet in Spack, and pip may be a perfectly valid alternative to using Spack. The main advantage of Spack over pip is its ability to compile non-Python dependencies. It can also build cythonized versions of a package or link to an optimized BLAS/LAPACK library like MKL, resulting in calculations that run orders of magnitudes faster. Spack does not offer a significant advantage over other python-management systems for installing and using tools like flake8 and sphinx. But if you need packages with non-Python dependencies like numpy and scipy, Spack will be very valuable to you.

Anaconda is another great alternative to Spack, and comes with its own conda package manager. Like Spack, Anaconda is capable of compiling non-Python dependencies. Anaconda contains many Python packages that are not yet in Spack, and Spack contains many Python packages that are not yet in Anaconda. The main advantage of Spack over Anaconda is its ability to choose a specific compiler and BLAS/LAPACK or MPI library. Spack also has better platform support for supercomputers, and can build optimized binaries for your specific microarchitecture.

External documentation

For more information on Python packaging, see:

For more information on build and installation frontend tools, see:

For more information on build backend tools, see: