Filesystems

There are two main options for storing files on the cluster. Understanding how they work and the policies in operation is important for both keeping your work secure and also obtaining good input/output performance when running calculations.

In addition to the options discussed below, you can also store files in /tmp. /tmp is local to each workstation and can be up to 30GB in size. However, as its name suggests, /tmp should only be used for storing temporary files. /tmp is erased by the operating system on a regular basis and without warning. In particular, if the root filesystem requires more space or the computer is rebooted, files in /tmp are likely to be deleted.

All workstations can read and write CDs and DVDs and tools for burning the disk are installed. Perhaps the best solution for external storage is USB drives. These are automatically mounted when connected to a workstation and a user is logged in via Xfce. Some thought should be taken with regards to the format of the USB drive. Most USB drives come formatted as FAT32 (which can be read by Linux, OSX and Windows). FAT32 is limited. In particular, it cannot store individual files larger than 4GB, which is a major issue in computational science. Other formats exist which don’t have this limitation. Linux and Windows can both read and write to NTFS drives. I believe OSX has limited support. Ext2, ext3 and (more recently) ext4 are Linux filesystems which can also handle large files but require 3rd party drivers for OSX and Windows. The best format to use depends upon which operating systems you wish to use the drive with and the file sizes you need to store.

home

Your home directory is /home/username and is mounted on the NFS server (thomson or ramsay). This means that when you log into any of the CMTH or TYC workstations (which can be identified from the login screen), you will find the same environment with the same home directory and personal customisations.

Whilst the NFS server performs extremely well, running heavy i/o operations (e.g. reading and/or writing files of the order of several GB) can have severe consequences on the performance of the cluster. In practice, this is only an issue when running several calculations.

Your home directory is subject to a quota. The default soft limit is 10GB; if you go above this, then you will have 7 days to reduce your file usage. The default hard limit is 10.2GB; exceeding this limit (or being above the soft limit for more than 7 days) will result in you no longer being able to log into the cluster. It may be possible to increase quota limits for sufficiently good reasons, though this can’t be done for everyone.

The quota limits are quite small. This is mainly due to limitations in how much data we can safely backup. Nevertheless, they should be sufficient for important files (e.g. personal settings, mail, papers, etc).

You can see your usage and quota limits by running:

$ quota

A good way to save space and still keep important files is to compress them. The gzip and bzip2 tools can reduce filesize substantially and work especially well for text files. Alternatively, files can be archived to external media or stored in /workspace.

data

Every user has a a directory located at /data/users/username which is NFS-mounted on all computers from maxwell. As it is a NFS-mounted drive, the same warnings about intensive i/o operations for /home also apply to /data. This is especially true as maxwell is intended primarily for computational workloads.

Your data directory is also subject to a quota: the default soft limit is 50GB and the default hard limit is 50.2GB.

/data is backed up. However, the backup system does not have space to handle hundreds of GB changing each day; /workspace is a better place to store such temporary and fluxional files.

Some research groups also have a directory under /data/groups/groupname, where groupname is the surname of the group leader, to which all members of that group have write access.

workspace

You have a directory under /workspace/username. This is local to each workstation, so any files you store there will only be accessible on that workstation. A symbolic link in your home directory (~/workspace) is for convenience. This filesystem is not subject to quota and is much larger than /home; /workspace is the space left on the local hard drive after room has been allocated for the operating system and network backup. /workspace is of the order 130GB on older machines and well over 300GB on newer machines.

As accessing files on /workspace does not involve network communications, it is substantially faster.

Important

The workspace directories are not backed up. If you have crucial data stored in the workspace on a particular machine and that machine goes wrong, your data will be lost. Crucial data stored in /workspace should be also stored elsewhere.

/workspace is shared by all users, though most people only need to use /workspace on their own workstation. Please be considerate. /workspace is monitored and heavy users are asked and expected to clean up their files on a regular basis.

If you accidentally delete the link from your new home directory to your workspace directory, you can recreate it using:

$ ln -s /workspace/username ~/workspace

As /workspace is local to each machine, you will see a different workspace directory if you log onto a different workstation. This is particularly important when submitting batch jobs using slurm. Since you cannot predict in advance where your batch job will run, you cannot rely on the contents of the workspace directory. The first part of your job script should copy the necessary data files from your home directory (visible everywhere) into the workspace directory (visible only on the machine running the job), and the last part should copy the results back to your home directory and clean up the workspace.

common

The directory /common is also mounted from the NFS server on all workstations. This is a read-only filesystem and contains programs and libraries that are not part of a standard linux distribution but have been installed especially for the cluster. Please see here for more details.