CMTH PyML Stack¶
Description¶
As we have a (small) number of workstations with GPUs capable of being used in calculations, we have put together a set of libraries with a python stack that can potentially be useful for people experimenting with machine learning. If you are interested in using this, I suggest checking with me to see if your workstation will support it. I believe the software in the module will still work on any workstation, but only some will be able to use GPU acceleration.
CUDA libraries are a little picky about compiler and library versions, so this module contains python compiled against (and depending on) gcc/5.3.0. This is a full SciPy stack. It contains python 3.6.4 as well as recent versions of the following packages and tools:
- numpy
- scipy
- matplotlib
- cython
- pandas
- ipython
- sympy
- nose
- jupyter
- numba
- bokeh
- scikit-learn
- pytorch
- mxnet
- keras
- fastai
- CUDA toolkit (8.0)
User instructions¶
The module can be loaded as:
module load cmth-pyml-stack/2018.01
Note - you need to have the gcc/5.3.0 module loaded first. If you need a different version of a package, or to compile a package yourself, this module can be used as a base that provides a recent python version and compatible CUDA installation for a virtual environment.
Source¶
License¶
Python has the PSF License (Python Software Foundation). From version 2.2 on this is GPL compatible.
The CUDA toolkit has its own license.
Admin notes¶
Compilation was all done on a workstation with bind mount. It’s likely the additional dependencies installed for the scipy stack previously were also needed, but were already installed: libssl-dev
, libsqlite3-dev
and tk-dev
.
First python was compiled. This was done with the gcc/5.3.0 as follows
ml purge
ml gcc/5.3.0
./configure \
--prefix=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 \
--enable-optimizations > configure.log &
make -j4 &> make.log &
make install
The install was synchronized with the main server and made into a module before proceeding with the rest.
The CUDA toolkit and bugfix patch wer installed with
export PERL5LIB=.
sh cuda_8.0.61_375.26_linux-run --override
sh cuda_8.0.61.2_linux-run
with the directory /common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01
selected for the installation, and subdirectory cuda-8.0-samples
for the samples. For this to work correctly the module template also sets the CUDA_HOME
environment variable when loaded.
The various “basic” python packages were installed with:
PYTHONUSERBASE=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 pip3 install --upgrade packagename
This was done for
- pip
- wheel
- numpy
- scipy
- matplotlib
- cython
- pandas
- ipython
- sympy
- nose
- jupyter
- numba
- bokeh
- scikit-learn
Pytorch was then installed as follows:
PYTHONUSERBASE=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl
PYTHONUSERBASE=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 pip3 install torchvision
I wanted to install tensorflow also, but this seems to have an issue where it installs a package that breaks pip, so was omitted. Users can install it in a virtual environment if needed.
MXNet was installed via the mxnet-cu80
package - this unfortunately downgrades the numpy version, but hopefully doesn’t break anything.
The fastai
package pulled in quite a number of other packages also.