HPC Basics – MAX IV

The High Performance Computing (HPC) cluster is a central resource available at MAX IV for users and staff. It is a small cluster compared to what one would find at dedicated supercomputing centers.

There are currently two sub-clusters nick-named “online” and “offline” cluster. The names are referring to their intended usage:

The online cluster, dedicated to data analysis during the beamtime. There are two front end machines, clu0-fe-2 and clu0-fe-3. The online cluster is only accessible within MAX IV, including access from the maxiv_user wifi network and from beamline workstations.
The offline cluster is a small cluster that can be used outside beam time by both staff and users. The front end is offline-fe1. The offline cluster is not accessible from the beamline computers, but it is possible to log in to it remotely using the VPN.

The HPC is maintained in cooperation with LUNARC (Lund University Computing Center) and so has a very similar architecture.

Anyone with a MAX IV account (including DUO accounts) can use the clusters. It is not necessary to apply for any special account, though access for non-staff may be limited to active proposal periods. Your account will automatically be enabled to use the batch queuing system at the first login to the HPC cluster, the process may take up to 30 minutes. For access problems or need for additional access, contact Thomas Eriksson.

Starting information for “Dummies”

If you are happy with a Linux prompt, just need to do simple things, do not want to read the long info below and do not want to bother other users by taking limited resources at the frontends then you just need to read this. See also access via Remote Desktop.

The online cluster frontends are clu0-fe-2 and clu0-fe-3. The offline frontend is offline-fe1 (actually an alias to clu1-fe-1)

# login using ssh (use MAX IV login-name)
ssh -X usrnam@clu0-fe-2 # step 1

# You are now at the computing cluster frontend. This machine has around
# 20 cores and ~ 60 GB of RAM and so it can comfortably serve several users
# simultaneously. You can do here whatever you are used to do at your
# laptop. But if you are planning to do something larger, e.g. use
# a software that can occupy all CPUs or take large amount of memory (> 20 GB)
# (watch out! it is quite easy with Matlab) it is strongly advised to skip on
# one of the computing nodes. This will give you more resource without
# affecting other users. And so start an "interactive" session.

interactive -c 2 -t 06:00:00 # step 2a (6 hours, single core, i.e. 2 hyperthreads)

# you can work now (!), you may find your data in
cd /data/visitors/(beamline)/(proposal)/(visit)
# where you use your beamline name, proposal and visit number.

# if you want more CPUs use -n option (useful e.g. for Matlab)
# if you want more RAM use --mem option (you are getting around 1.5 GB per logical CPU, i.e. hyperthread)
interactive -n 4 -c 2 --mem 20GB -t 06:00:00 # step 2b (4 cores, i.e. 8 hyperthreads, 20GB RAM)

You may be wondering there is not much software available, software and libraries are old. In such a case you need to understand the basics of modular software installation. See

LUNARC User Documentation is the best reference
Some basic module system commands:
module list # list loaded modules
module avail # show available modules that can be loaded directly
module spider modulename # look for an installed module
module add modulename # load module
module load modulename # load module (same as add)
module spider exact-modulename # get info about module
module remove modulename # unload modul
module purge # unload all loaded modules
Note: There are also frontends with Linux virtual desktop that may fit better your needs: clu0-fe-2, clu0-fe-3, offline-fe1

Home directories

Each user has its own cluster-dedicated home directory (~) common at frontends and nodes (this is a permanent storage with backup). Accessory users mxn-home/visitors directories are mounted at frontends and nodes to be available for user convenience.

# compare
ls ~
ls /mxn/visitors/username

Node local storage

$TMPDIR=/local/slurmtmp.$SLURM_JOB_ID

Note: variable set only in sbatch scripts, not in interactive mode. However the directory is there.

Storage

Beamline/scientific data storage is mounted in /data/visitors/(beamline)/(proposal)/(visit)

ls /data/visitors/biomax/prn0001/20160622

Using software at MAX IV cluster

Software installation at MAX IV HPC cluster is identical to LUNARC Aurora. Hierarchical environment modules scheme is used in order to provide rich and unified software environment for scientific applications. We refer to LUNARC User Documentation for useful and precise information.

Getting information

MAX IV cluster is using SLURM (Simple Linux Utilityfor Resource Management).

# view information about nodes and partitions
sinfo
all*         up 7-00:00:00      1  drain cn17
all*         up 7-00:00:00      7   idle cn[20-26]
gpu          up 7-00:00:00      1   idle gn0
# view  information about jobs located in the scheduling queue
squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
...

Useful commands

See How to use a job submission system at LUNARC

# start a job from the batch file 'j_CDImap.sh' - see Lunarc documentation or an example here
sbatch j_CDImap.sh
# or to run interactive bash in node cn26
interactive --nodelist=cn26
# add the "-p v100" option to indicate if you request a V100 gpu node
interactive -p v100
# According to LUNARC documentation it is strongly recommended to "purge"
# all modules after entering the interactive
module purge

Deprecated method:
srun --nodelist=cn26 --pty bash

# cancel/stop job
scancel JOBID

# reserve a CPU node
salloc -N 1
# Note: after this cmd you are logged into the first allocated node


# reserve a whole GPU node
salloc -p v100 --exclusive

Preparing a batch script

See a detailed tutorial within LUNARC documentation.
Below is just a quick and dirty example asking exclusively for nodes cn28 and cn29. We have maximum 48 tasks per node.
j_CDImap.sh

#!/bin/bash
#
# job time, change for what your job requires
#SBATCH -t 00:10:00
#
# job name
#SBATCH -J j_CDImap
#
#SBATCH --exclusive
#SBATCH -N 2
#SBATCH --tasks-per-node=48
#SBATCH --nodelist=cn28,cn29
# filenames stdout and stderr - customise, include %j
#SBATCH -o process_%j.out
#SBATCH -e process_%j.err
# write this script to stdout-file - useful for scripting errors
cat $0
# load the modules required for you program - customise for your program
module purge
module add foss/2018a h5py/2.7.1-Python-2.7.14
# run the program
# customise for your program name and add arguments if required
mpirun -n 96 python /mxn/nanomax/sw/CDIsuite/XRFCDImapping.py --path=/data/nanomax/prn20161125/ --file=GIA_sxw.h5 --scan=12 --scratch=$TMPDIR

Get statistics on completed jobs

Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc. See below for two examples,

To get statistics on completed jobs by jobID

sacct -j jobid --format=JobID,JobName,MaxRSS,Elapsed
# To view the same information for all jobs of a user
sacct -u usrnam --format=JobID,JobName,MaxRSS,Elapsed

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.