CMTH PyML Stack

Description

As we have a (small) number of workstations with GPUs capable of being used in calculations, we have put together a set of libraries with a python stack that can potentially be useful for people experimenting with machine learning. If you are interested in using this, I suggest checking with me to see if your workstation will support it. I believe the software in the module will still work on any workstation, but only some will be able to use GPU acceleration.

CUDA libraries are a little picky about compiler and library versions, so this module contains python compiled against (and depending on) gcc/5.3.0. This is a full SciPy stack. It contains python 3.6.4 as well as recent versions of the following packages and tools:

  • numpy
  • scipy
  • matplotlib
  • cython
  • pandas
  • ipython
  • sympy
  • nose
  • jupyter
  • numba
  • bokeh
  • scikit-learn
  • pytorch
  • mxnet
  • keras
  • fastai
  • CUDA toolkit (8.0)

User instructions

The module can be loaded as:

module load cmth-pyml-stack/2018.01

Note - you need to have the gcc/5.3.0 module loaded first. If you need a different version of a package, or to compile a package yourself, this module can be used as a base that provides a recent python version and compatible CUDA installation for a virtual environment.

License

Python has the PSF License (Python Software Foundation). From version 2.2 on this is GPL compatible.

The CUDA toolkit has its own license.

Admin notes

Compilation was all done on a workstation with bind mount. It’s likely the additional dependencies installed for the scipy stack previously were also needed, but were already installed: libssl-dev, libsqlite3-dev and tk-dev. First python was compiled. This was done with the gcc/5.3.0 as follows

ml purge
ml gcc/5.3.0
./configure \
    --prefix=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 \
    --enable-optimizations > configure.log &
make -j4 &> make.log &
make install

The install was synchronized with the main server and made into a module before proceeding with the rest.

The CUDA toolkit and bugfix patch wer installed with

export PERL5LIB=.
sh cuda_8.0.61_375.26_linux-run --override
sh cuda_8.0.61.2_linux-run

with the directory /common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 selected for the installation, and subdirectory cuda-8.0-samples for the samples. For this to work correctly the module template also sets the CUDA_HOME environment variable when loaded.

The various “basic” python packages were installed with:

PYTHONUSERBASE=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 pip3 install --upgrade packagename

This was done for

  • pip
  • wheel
  • numpy
  • scipy
  • matplotlib
  • cython
  • pandas
  • ipython
  • sympy
  • nose
  • jupyter
  • numba
  • bokeh
  • scikit-learn

Pytorch was then installed as follows:

PYTHONUSERBASE=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl
PYTHONUSERBASE=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 pip3 install torchvision

I wanted to install tensorflow also, but this seems to have an issue where it installs a package that breaks pip, so was omitted. Users can install it in a virtual environment if needed.

MXNet was installed via the mxnet-cu80 package - this unfortunately downgrades the numpy version, but hopefully doesn’t break anything. The fastai package pulled in quite a number of other packages also.