Python packages and modules have their own special build system.
PythonPackage base class provides the following phases that
can be overridden:
These are all standard
setup.py commands and can be found by running:
$ python setup.py --help-commands
By default, only the
install phases are run:
build- build everything needed to install
install- install everything from build directory
If for whatever reason you need to run more phases, simply modify your
phases list like so:
phases = ['build_ext', 'install', 'bdist']
Each phase provides a function
<phase> that runs:
$ python -s setup.py --no-user-cfg <phase>
Each phase also has a
<phase_args> function that can pass arguments to
this call. All of these functions are empty except for the
function, which passes
--prefix=/path/to/installation/prefix. There is
also some additional logic specific to setuptools and eggs.
If you need to run a phase that is not a standard
you’ll need to define a function for it like so:
phases = ['configure', 'build', 'install'] def configure(self, spec, prefix): self.setup_py('configure')
Python packages can be identified by the presence of a
This file is used by package managers like
pip to determine a
package’s dependencies and the version of dependencies required, so if
setup.py file is not accurate, the package will not build properly.
For this reason, the
setup.py file should be fairly reliable. If the
setup.py disagree on something, the
file should be considered to be the truth. As dependencies are added or
removed, the documentation is much more likely to become outdated than
Finding Python packages¶
The vast majority of Python packages are hosted on PyPI - The Python
pip only supports packages hosted on PyPI, making
it the only option for developers who want a simple installation.
Search for “PyPI <package-name>” to find the download page. Note that
some pages are versioned, and the first result may not be the newest
version. Click on the “Latest Version” button to the top right to see
if a newer version is available. The download page is usually at:
The top of the PyPI downloads page contains a description of the package. The first line is usually a short description, while there may be a several line “Project Description” that follows. Choose whichever is more useful. You can also get these descriptions on the command-line using:
$ python setup.py --description $ python setup.py --long-description
Package developers use
setup.py to upload new versions to PyPI.
setup method often passes metadata like
homepage to PyPI.
This metadata is displayed on the left side of the download page.
Search for the text “Homepage” under “Project links” to find it. You
should use this page instead of the PyPI page if they differ. You can
also get the homepage on the command-line by running:
$ python setup.py --url
You may have noticed that Spack allows you to add multiple versions of the same package without adding multiple versions of the download URL. It does this by guessing what the version string in the URL is and replacing this with the requested version. Obviously, if Spack cannot guess the version correctly, or if non-version-related things change in the URL, Spack cannot substitute the version properly.
Once upon a time, PyPI offered nice, simple download URLs like: https://pypi.python.org/packages/source/n/numpy/numpy-1.13.1.zip
As you can see, the version is 1.13.1. It probably isn’t hard to guess what URL to use to download version 1.12.0, and Spack was perfectly capable of performing this calculation.
However, PyPI switched to a new download URL format: https://pypi.python.org/packages/c0/3a/40967d9f5675fbb097ffec170f59c2ba19fc96373e73ad47c2cae9a30aed/numpy-1.13.1.zip#md5=2c3c0f4edf720c3a7b525dacc825b9ae
As you can imagine, it is impossible for Spack to guess what URL to use to download version 1.12.0 given this URL. There is a solution, however. PyPI offers a new hidden interface for downloading Python packages that does not include a hash in the URL: https://pypi.io/packages/source/n/numpy/numpy-1.13.1.zip
This URL redirects to the files.pythonhosted.org URL. The general syntax for this pypi.io URL is: https://pypi.io/packages/source/<first-letter-of-name>/<name>/<name>-<version>.<extension>
Please use the pypi.io URL instead of the pypi.python.org URL. If both
.zip versions are available,
.tar.gz is preferred.
If some releases offer both
.zip versions, but some
.zip versions, use
PyPI vs. GitHub¶
Many packages are hosted on PyPI, but are developed on GitHub and other version control systems. The tarball can be downloaded from either location, but PyPI is preferred for the following reasons:
PyPI contains the bare minimum of files to install the package.
You may notice that the tarball you download from PyPI does not have the same checksum as the tarball you download from GitHub. When a developer uploads a new release to PyPI, it doesn’t contain every file in the repository, only the files necessary to install the package. PyPI tarballs are therefore smaller.
PyPI is the official source for package managers like
Let’s be honest,
pipis much more popular than Spack. If the GitHub tarball contains a file not present in the PyPI tarball that causes a bug, the developers may not realize this for quite some time. If the bug was in a file contained in the PyPI tarball, users would notice the bug much more quickly.
GitHub release may be a beta version.
When a developer releases a new version of a package on GitHub, it may not be intended for most users. Until that release also makes its way to PyPI, it should be assumed that the release is not yet ready for general use.
The checksum for a GitHub release may change.
Unfortunately, some developers have a habit of patching releases without incrementing the version number. This results in a change in tarball checksum. Package managers like Spack that use checksums to verify the integrity of a download tarball grind to a halt when the checksum for a known version changes. Most of the time, the change is intentional, and contains a needed bug fix. However, sometimes the change indicates a download source that has been compromised, and a tarball that contains a virus. If this happens, you must contact the developers to determine which is the case. PyPI is nice because it makes it physically impossible to re-release the same version of a package with a different checksum.
There are some reasons to prefer downloading from GitHub:
The GitHub tarball may contain unit tests
As previously mentioned, the PyPI tarball contains the bare minimum of files to install the package. Unless explicitly specified by the developers, it will not contain development files like unit tests. If you desire to run the unit tests during installation, you should use the GitHub tarball instead.
Spack does not yet support
spack checksumwith PyPI URLs
These commands work just fine with GitHub URLs. This is a minor annoyance, not a reason to prefer GitHub over PyPI.
If you really want to run these unit tests, no one will stop you from submitting a PR for a new package that downloads from GitHub.
Build system dependencies¶
There are a few dependencies common to the
PythonPackage build system.
PythonPackage needs Python at build-time to run
python setup.py build && python setup.py install. Python is also
needed at run-time if you want to import the module. Due to backwards
incompatible changes between Python 2 and 3, it is very important to
specify which versions of Python are supported. If the documentation
mentions that Python 3 is required, this can be specified as:
depends_on('python@3:', type=('build', 'run')
If Python 2 is required, this would look like:
depends_on('python@:2', type=('build', 'run')
If Python 2.7 is the only version that works, you can use:
depends_on('firstname.lastname@example.org:2.8', type=('build', 'run')
The documentation may not always specify supported Python versions.
Another place to check is in the
setup.py file. Look for a line
python_requires. An example from
More commonly, you will find a version check at the top of the file:
if sys.version_info[:2] < (2, 7) or (3, 0) <= sys.version_info[:2] < (3, 4): raise RuntimeError("Python version 2.7 or >= 3.4 required.")
This can be converted to Spack’s spec notation like so:
depends_on('email@example.com:2.8,3.4:', type=('build', 'run'))
Originally, the Python language had a single build system called distutils, which is built into Python. Distutils provided a common framework for package authors to describe their project and how it should be built. However, distutils was not without limitations. Most notably, there was no way to list a project’s dependencies with distutils. Along came setuptools, a non-builtin build system designed to overcome the limitations of distutils. Both projects use a similar API, making the transition easy while adding much needed functionality. Today, setuptools is used in around 75% of the Python packages in Spack.
Since setuptools isn’t built-in to Python, you need to add it as a dependency. To determine whether or not a package uses setuptools, search the file for an import statement like:
from setuptools import setup
Some packages are designed to work with both setuptools and distutils, so you may find something like:
try: from setuptools import setup except ImportError: from distutils.core import setup
This uses setuptools if available, and falls back to distutils if not. In this case, you would still want to add a setuptools dependency, as it offers us more control over the installation.
Unless specified otherwise, setuptools is usually a build-only dependency. That is, it is needed to install the software, but is not needed at run-time. This can be specified as:
Compared to compiled languages, interpreted languages like Python can be quite a bit slower. To work around this, some Python developers rewrite computationally demanding sections of code in C, a process referred to as “cythonizing”. In order to build these package, you need to add a build dependency on cython:
Look for references to “cython” in the
setup.py to determine
whether or not this is necessary. Cython may be optional, but
even then you should list it as a required dependency. Spack is
designed to compile software, and is meant for HPC facilities
where speed is crucial. There is no reason why someone would not
want an optimized version of a library instead of the pure-Python
When you install a package with
pip, it reads the
file in order to determine the dependencies of the package.
If the dependencies are not yet installed,
pip downloads them
and installs them for you. This may sound convenient, but Spack
cannot rely on this behavior for two reasons:
Spack needs to be able to install packages on air-gapped networks.
If there is no internet connection,
pipcan’t download the package dependencies. By explicitly listing every dependency in the
package.py, Spack knows what to download ahead of time.
Duplicate installations of the same dependency may occur.
Spack supports activation of Python extensions, which involves symlinking the package installation prefix to the Python installation prefix. If your package is missing a dependency, that dependency will be installed to the installation directory of the same package. If you try to activate the package + dependency, it may cause a problem if that package has already been activated.
For these reasons, you must always explicitly list all dependencies.
Although the documentation may list the package’s dependencies,
often the developers assume people will use
pip and won’t have to
worry about it. Always check the
setup.py to find the true
If the package relies on
distutils, it may not explicitly list its
dependencies. Check for statements like:
try: import numpy except ImportError: raise ImportError("numpy must be installed prior to installation")
Obviously, this means that
py-numpy is a dependency.
If the package uses
setuptools, check for the following clues:
These packages are required for installation.
These packages are optional dependencies that enable additional functionality. You should add a variant that optionally adds these dependencies.
These are packages that are required to run the unit tests for the package. These dependencies can be specified using the
In the root directory of the package, you may notice a
requirements.txt file. It may look like this file contains a list
of all of the package’s dependencies. Don’t be fooled. This file is
used by tools like Travis to install the pre-requisites for the
package… and a whole bunch of other things. It often contains
dependencies only needed for unit tests, like:
It can also contain dependencies for building the documentation, like
sphinx. If you can’t find any information about the package’s
dependencies, you can take a look in
requirements.txt, but be sure
not to add test or documentation dependencies.
Setuptools is a bit of a special case. If a package requires setuptools
at run-time, how do they express this? They could add it to
install_requires, but setuptools is imported long before this and
needed to read this line. And since you can’t install the package
without setuptools, the developers assume that setuptools will already
be there, so they never mention when it is required. We don’t want to
add run-time dependencies if they aren’t needed, so you need to
determine whether or not setuptools is needed. Grep the installation
directory for any files containing a reference to
pkg_resources. Both modules come from
pkg_resources is particularly common in scripts in
Passing arguments to setup.py¶
The default build and install phases should be sufficient to install most packages. However, you may want to pass additional flags to either phase.
You can view the available options for a particular phase with:
$ python setup.py <phase> --help
Each phase provides a
<phase_args> function that can be used to
pass arguments to that phase. For example,
def build_args(self, spec, prefix): args =  # From NumPy 1.10.0 on it's possible to do a parallel build. if self.version >= Version('1.10.0'): # But Parallel build in Python 3.5+ is broken. See: # https://github.com/spack/spack/issues/7927 # https://github.com/scipy/scipy/issues/7112 if spec['python'].version < Version('3.5'): args = ['-j', str(make_jobs)] return args
PythonPackage provides a couple of options for testing packages.
Just because a package successfully built does not mean that it built correctly. The most reliable test of whether or not the package was correctly installed is to attempt to import all of the modules that get installed. To get a list of modules, run the following command in the source directory:
$ python >>> import setuptools >>> setuptools.find_packages() ['numpy', 'numpy._build_utils', 'numpy.compat', 'numpy.core', 'numpy.distutils', 'numpy.doc', 'numpy.f2py', 'numpy.fft', 'numpy.lib', 'numpy.linalg', 'numpy.ma', 'numpy.matrixlib', 'numpy.polynomial', 'numpy.random', 'numpy.testing', 'numpy.core.code_generators', 'numpy.distutils.command', 'numpy.distutils.fcompiler']
Large, complex packages like
numpy will return a long list of
packages, while other packages like
six will return an empty list.
py-six installs a single
six.py file. In Python packaging lingo,
a “package” is a directory containing files like:
foo/__init__.py foo/bar.py foo/baz.py
whereas a “module” is a single Python file. Since
only returns packages, you’ll have to determine the correct module
names yourself. You can now add these packages and modules to the
package like so:
import_modules = ['six']
When you run
spack install --test=root py-six, Spack will attempt
to import the
six module after installation.
These tests most often catch missing dependencies and non-RPATHed libraries. Make sure not to add modules/packages containing the word “test”, as these likely won’t end up in installation directory.
The package you want to install may come with additional unit tests. By default, Spack runs:
$ python setup.py test
if it detects that the
setup.py file supports a
You can add additional build-time or install-time tests by overriding
installtest, respectively. For example,
def install_test(self): with working_dir('..'): python('-c', 'import numpy; numpy.test("full", verbose=2)')
Setup file in a sub-directory¶
In order to be compatible with package managers like
pip, the package
is required to place its
setup.py in the root of the tarball. However,
not every Python package cares about
pip or PyPI. If you are installing
a package that is not hosted on PyPI, you may find that it places its
setup.py in a sub-directory. To handle this, add the directory containing
setup.py to the package like so:
build_directory = 'source'
Alternate names for setup.py¶
As previously mentioned, packages need to call their setup script
in order to be compatible with package managers like
pip. However, some
come with multiple setup scripts, one for a serial build and another for a
parallel build. You can override the default name to use like so:
def setup_file(self): return 'setup-mpi.py' if '+mpi' in self.spec else 'setup.py'
PythonPackage vs. packages that use Python¶
There are many packages that make use of Python, but packages that depend
on Python are not necessarily
Choosing a build system¶
First of all, you need to select a build system.
spack create usually
does this for you, but if for whatever reason you need to do this manually,
PythonPackage if and only if the package contains a
Choosing a package name¶
Selecting the appropriate package name is a little more complicated
than choosing the build system. By default,
spack create will
py- to the beginning of the package name if it detects
that the package uses the
PythonPackage build system. However, there
are occasionally packages that use
PythonPackage that shouldn’t
py-. For example:
The thing these packages have in common is that they are command-line
tools that just so happen to be written in Python. Someone who wants
mercurial with Spack isn’t going to realize that it is
written in Python, and they certainly aren’t going to assume the package
py-mercurial. For this reason, we manually renamed the
Likewise, there are occasionally packages that don’t use the
PythonPackage build system but should still be prepended with
These packages are primarily used as Python libraries, not as command-line tools. You may see C/C++ packages that have optional Python language-bindings, such as:
Don’t prepend these kind of packages with
py-. When in doubt,
think about how this package will be used. Is it primarily a Python
library that will be imported in other Python scripts? Or is it a
command-line tool, or C/C++/Fortran program with optional Python
modules? The former should be prepended with
py-, while the
latter should not.
extends vs. depends_on¶
This is very similar to the naming dilemma above, with a slight twist.
As mentioned in the Packaging Guide,
depends_on are very similar, but
the ability to activate the package. Activation involves symlinking
everything in the installation prefix of the package to the installation
prefix of Python. This allows the user to import a Python module without
having to add that module to
When deciding between
depends_on, the best rule of
thumb is to check the installation prefix. If Python libraries are
prefix/lib/python2.7/site-packages (where 2.7 is the
MAJOR.MINOR version of Python you used to install the package), then
you should use
extends. If Python libraries are installed elsewhere
or the only files that get installed reside in
extends, as symlinking the package wouldn’t be useful.
Alternatives to Spack¶
PyPI has hundreds of thousands of packages that are not yet in Spack,
pip may be a perfectly valid alternative to using Spack. The
main advantage of Spack over
pip is its ability to compile
non-Python dependencies. It can also build cythonized versions of a
package or link to an optimized BLAS/LAPACK library like MKL,
resulting in calculations that run orders of magnitude faster.
Spack does not offer a significant advantage to other python-management
systems for installing and using tools like flake8 and sphinx.
But if you need packages with non-Python dependencies like
numpy and scipy, Spack will be very valuable to you.
Anaconda is another great alternative to Spack, and comes with its own
conda package manager. Like Spack, Anaconda is capable of compiling
non-Python dependencies. Anaconda contains many Python packages that
are not yet in Spack, and Spack contains many Python packages that are
not yet in Anaconda. The main advantage of Spack over Anaconda is its
ability to choose a specific compiler and BLAS/LAPACK or MPI library.
Spack also has better platform support for supercomputers. On the
other hand, Anaconda offers Windows support.