HPC

The High Performance Computing (HPC) cluster is a central resource available at MAX IV for users and staff.  It is a small cluster compared to what one would find at dedicated supercomputing centers. It is maintained in cooperation with LUNARC (Lund University Computing Center) the cluster has a very similar architecture.

There are currently two sub-clusters nick-named “online” and “offline” cluster. The names are referring to their intended usage.

  • The online cluster is intended for data analysis during the time of data collection
  • The offline cluster is a small subset that can be used outside beamtime by both staff and users.

Anyone with MAX IV account can use the clusters. It is not necessary to apply for any special account, though access for non-staff may be limited to active proposal periods. For access problems or need for additional access, contact Thomas Eriksson –

Starting information for “Dummies”

If you are happy with a Linux prompt, just need to do simple things, do not want to read the long info below and do not want to bother other users by taking limited resources at the frontends then you just need to read this:

# login using ssh (use MAX IV login-name)
ssh -X usrnam@clu0-fe-1 # step 1
# You are now at the computing cluster frontend. This machine has around
# 20 cores and 60 GB of RAM and so it can comfortably serve several users
# simultaneously. You can do here whatever you are used to do at your
# laptop. But if you are planning to do something larger, e.g. use
# a software that can occupy all CPUs or take large amount of memory (> 20 GB)
# (watch out! it is quite easy with Matlab) it is strongly advice to skip on
# one of the computing nodes. This will give you more resource without
# affecting other users. And so start an "interactive" session.
interactive -t 06:00:00 # step 2a (6 hours, single core)

# you can work now (!), you may find your data in
cd /data/visitors/(beamline)/(proposal)/(visit)
# where you use your beamline name, proposal and visit number.

# if you want more CPUs use -n option (useful e.g. for Matlab)
# if you want more RAM use --mem option (you are getting around 1.5 GB per CPU)
interactive -n 8 --mem 20GB -t 06:00:00 # step 2b (4 cores, i.e. 8 hyperthreads, 20GB RAM)

You may be wondering there is not much software available, software and libraries are old. In such a case you need to understand the basics of  modular software installation. See

  • LUNARC User Documentation is the best reference
  • Some basic module system commands:
  • module avail                    # show available modules module spider modulename        # look for an installed module module add modulename          # load module module spider exact-modulename  # get info about module module list                     # list of loaded modules module remove modulename        # unload module module purge                    # unload all
  • Note: There are also frontends with Linux virtual desktop that may fit better your needs: clu0-fe-1, clu0-gn-0, offline-fe1

Home directories

Each user has its own cluster-dedicated home directory (~) common at frontends and nodes (this is a permanent storage with possible backup). Accessory users mxn-home directories are mounted at frontends and nodes to be available for user convenience.

# compare
ls ~
ls /mxn/home/usrnam

Node local storage

$TMPDIR=/local/slurmtmp.$SLURM_JOB_ID

Note: variable set only in sbatch scripts, not in interactive mode.

Storage

Beamline/scientific data storage is mounted in /data/visitors/(beamline)/(proposal)/(visit)

ls /data/visitors/biomax/prn0001/20160622

BioMax buffer storage is temporally mounted to /mxn/biomax-eiger-dc-1.

Using software at MAX IV cluster

Software installation at MAX IV HPC cluster is identical to LUNARC Aurora. Hierarchical environment modules scheme is used in order to provide rich and unified software environment for scientific applications. We refer to LUNARC User Documentation for useful and precise information.

Getting information[edit | edit source]

MAX IV cluster is using SLURM (Simple Linux Utilityfor Resource Management).

# view information about nodes and partitions
sinfo
all*         up 7-00:00:00      1  drain cn7
all*         up 7-00:00:00      7   idle cn[0-6]
gpu          up 7-00:00:00      1   idle gn0
# view  information about jobs located in the scheduling queue
squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
...

Stop job

# cancel/stop job
scancel JOBID

Submitting a job

See: How to use a job submission system at LUNARC

# start a job from the batch file 'j_CDImap.sh' - see Lunarc documentation or an example here
sbatch j_CDImap.sh
# or to run interactive bash in node cn6
interactive --nodelist=cn6
# add the "-p v100" option to indicate if you request a V100 gpu node
interactive -p v100
# According to LUNARC documentation it is strongly recommended to "purge"
# all modules after entering the interactive
module purge

Deprecated method:
srun --nodelist=cn6 --pty bash

Other useful commands:

# reserve a CPU node
salloc -N 1
# Note: after this cmd you are logged into the first allocated node

# reserve a whole GPU node
salloc -p v100 --exclusive

Preparing a batch script

See a detailed tutorial within LUNARC documentation.
Below is just a quick and dirty example asking exclusively for nodes cn8 and cn9. We have maximum 48 tasks per node.
j_CDImap.sh

#!/bin/bash
#
# job time, change for what your job requires
#SBATCH -t 00:10:00
#
# job name
#SBATCH -J j_CDImap
#
#SBATCH --exclusive
#SBATCH -N 2
#SBATCH --tasks-per-node=48
#SBATCH --nodelist=cn8,cn9
# filenames stdout and stderr - customise, include %j
#SBATCH -o process_%j.out
#SBATCH -e process_%j.err
# write this script to stdout-file - useful for scripting errors
cat $0
# load the modules required for you program - customise for your program
module purge
module add foss/2018a h5py/2.7.1-Python-2.7.14
# run the program
# customise for your program name and add arguments if required
mpirun -n 96 python /mxn/nanomax/sw/CDIsuite/XRFCDImapping.py --path=/data/nanomax/prn20161125/ --file=GIA_sxw.h5 --scan=12 --scratch=$TMPDIR

Get statistics on completed jobs

Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc. See below for two examples,

To get statistics on completed jobs by jobID

sacct -j jobid --format=JobID,JobName,MaxRSS,Elapsed
# To view the same information for all jobs of a user
sacct -u usrnam --format=JobID,JobName,MaxRSS,Elapsed