.. index:: CMTH PyML Stack
.. _CMTH PyML Stack:

CMTH PyML Stack
================

Description
-----------

As we have a (small) number of workstations with GPUs capable of being used in calculations, we have put together a set of libraries with a python stack that can potentially be useful for people experimenting with machine learning. If you are interested in using this, I suggest checking with me to see if your workstation will support it. I believe the software in the module will still work on any workstation, but only some will be able to use GPU acceleration.

CUDA libraries are a little picky about compiler and library versions, so this module contains python compiled against (and depending on) gcc/5.3.0.
This is a full `SciPy stack <https://www.scipy.org/stackspec.html>`_. It contains python 3.6.4 as well as recent versions of the following packages and tools:

- numpy
- scipy
- matplotlib
- cython
- pandas
- ipython
- sympy
- nose
- jupyter
- numba
- bokeh
- scikit-learn
- pytorch
- mxnet
- keras
- fastai
- CUDA toolkit (8.0)

User instructions
-----------------

The module can be loaded as:

.. code-block:: bash

     module load cmth-pyml-stack/2018.01

Note - you need to have the gcc/5.3.0 module loaded first. If you need a different version of a package, or to compile a package yourself, this module can be used as a base that provides a recent python version and compatible CUDA installation for a virtual environment.

Source
------

- http://www.python.org
- https://www.scipy.org
- https://pypi.python.org
- http://www.pytorch.org
- https://developer.nvidia.com/cuda-80-ga2-download-archive

License
-------

Python has the PSF License (Python Software Foundation). From version 2.2 on this is GPL compatible.

The CUDA toolkit has its own `license <http://docs.nvidia.com/cuda/eula/index.html>`_.

Admin notes
-----------

Compilation was all done on a workstation with bind mount. It's likely the additional dependencies installed for the scipy stack previously were also needed, but were already installed: ``libssl-dev``, ``libsqlite3-dev`` and ``tk-dev``.
First python was compiled. This was done with the gcc/5.3.0 as follows

.. code-block:: bash

    ml purge
    ml gcc/5.3.0
    ./configure \
        --prefix=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 \
        --enable-optimizations > configure.log &
    make -j4 &> make.log &
    make install

The install was synchronized with the main server and made into a module before proceeding with the rest.

The CUDA toolkit and bugfix patch wer installed with

.. code-block:: bash

    export PERL5LIB=.
    sh cuda_8.0.61_375.26_linux-run --override
    sh cuda_8.0.61.2_linux-run

with the directory ``/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01`` selected for the installation, and subdirectory ``cuda-8.0-samples`` for the samples. For this to work correctly the module template also sets the ``CUDA_HOME`` environment variable when loaded.

The various "basic" python packages were installed with:

.. code-block:: bash

    PYTHONUSERBASE=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 pip3 install --upgrade packagename

This was done for

- pip
- wheel
- numpy
- scipy
- matplotlib
- cython
- pandas
- ipython
- sympy
- nose
- jupyter
- numba
- bokeh
- scikit-learn

Pytorch was then installed as follows:

.. code-block:: bash

    PYTHONUSERBASE=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl
    PYTHONUSERBASE=/common/debian/9.1/Compiler/gcc/5.3/cmth-pyml-stack/cmth-pyml-stack-2018.01 pip3 install torchvision

I wanted to install tensorflow also, but this seems to have an issue where it installs a package that breaks pip, so was omitted. Users can install it in a virtual environment if needed.

MXNet was installed via the ``mxnet-cu80`` package - this unfortunately downgrades the numpy version, but hopefully doesn't break anything.
The ``fastai`` package pulled in quite a number of other packages also.