The High Performance Computing (HPC) cluster is a central resource available at MAX IV for users and staff. It is a small cluster compared to what one would find at dedicated supercomputing centers. It is maintained in cooperation with LUNARC (Lund University Computing Center) the cluster has a very similar architecture.
There are currently two sub-clusters nick-named “online” and “offline” cluster. The names are referring to their intended usage.
- The online cluster is intended for data analysis during the time of data collection
- The offline cluster is a small subset that can be used outside beamtime by both staff and users.
Anyone with MAX IV account can use the clusters. It is not necessary to apply for any special account, though access for non-staff may be limited to active proposal periods. For access problems or need for additional access, contact Thomas Eriksson –
Starting information for “Dummies”
If you are happy with a Linux prompt, just need to do simple things, do not want to read the long info below and do not want to bother other users by taking limited resources at the frontends then you just need to read this:
# login using ssh (use MAX IV login-name) ssh -X usrnam@clu0-fe-1 # step 1 # You are now at the computing cluster frontend. This machine has around # 20 cores and 60 GB of RAM and so it can comfortably serve several users # simultaneously. You can do here whatever you are used to do at your # laptop. But if you are planning to do something larger, e.g. use # a software that can occupy all CPUs or take large amount of memory (> 20 GB) # (watch out! it is quite easy with Matlab) it is strongly advice to skip on # one of the computing nodes. This will give you more resource without # affecting other users. And so start an "interactive" session. interactive -t 06:00:00 # step 2a (6 hours, single core) # you can work now (!), you may find your data in cd /data/visitors/(beamline)/(proposal)/(visit) # where you use your beamline name, proposal and visit number. # if you want more CPUs use -n option (useful e.g. for Matlab) # if you want more RAM use --mem option (you are getting around 1.5 GB per CPU) interactive -n 8 --mem 20GB -t 06:00:00 # step 2b (4 cores, i.e. 8 hyperthreads, 20GB RAM)
You may be wondering there is not much software available, software and libraries are old. In such a case you need to understand the basics of modular software installation. See
- LUNARC User Documentation is the best reference
- Some basic module system commands:
- module avail # show available modules module spider modulename # look for an installed module module add modulename # load module module spider exact-modulename # get info about module module list # list of loaded modules module remove modulename # unload module module purge # unload all
- Note: There are also frontends with Linux virtual desktop that may fit better your needs: clu0-fe-1, clu0-gn-0, offline-fe1
Each user has its own cluster-dedicated home directory (~) common at frontends and nodes (this is a permanent storage with possible backup). Accessory users mxn-home directories are mounted at frontends and nodes to be available for user convenience.
# compare ls ~ ls /mxn/home/usrnam
Node local storage
Note: variable set only in sbatch scripts, not in interactive mode.
Beamline/scientific data storage is mounted in /data/visitors/(beamline)/(proposal)/(visit)
BioMax buffer storage is temporally mounted to /mxn/biomax-eiger-dc-1.
Using software at MAX IV cluster
Software installation at MAX IV HPC cluster is identical to LUNARC Aurora. Hierarchical environment modules scheme is used in order to provide rich and unified software environment for scientific applications. We refer to LUNARC User Documentation for useful and precise information.
Getting information[edit | edit source]
MAX IV cluster is using SLURM (Simple Linux Utilityfor Resource Management).
# view information about nodes and partitions sinfo all* up 7-00:00:00 1 drain cn7 all* up 7-00:00:00 7 idle cn[0-6] gpu up 7-00:00:00 1 idle gn0 # view information about jobs located in the scheduling queue squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) ...
# cancel/stop job scancel JOBID
Submitting a job
# start a job from the batch file 'j_CDImap.sh' - see Lunarc documentation or an example here sbatch j_CDImap.sh # or to run interactive bash in node cn6 interactive --nodelist=cn6 # add the "-p v100" option to indicate if you request a V100 gpu node interactive -p v100 # According to LUNARC documentation it is strongly recommended to "purge" # all modules after entering the interactive module purge Deprecated method: srun --nodelist=cn6 --pty bash
Other useful commands:
# reserve a CPU node salloc -N 1 # Note: after this cmd you are logged into the first allocated node # reserve a whole GPU node salloc -p v100 --exclusive
Preparing a batch script
See a detailed tutorial within LUNARC documentation.
Below is just a quick and dirty example asking exclusively for nodes cn8 and cn9. We have maximum 48 tasks per node.
#!/bin/bash # # job time, change for what your job requires #SBATCH -t 00:10:00 # # job name #SBATCH -J j_CDImap # #SBATCH --exclusive #SBATCH -N 2 #SBATCH --tasks-per-node=48 #SBATCH --nodelist=cn8,cn9 # filenames stdout and stderr - customise, include %j #SBATCH -o process_%j.out #SBATCH -e process_%j.err # write this script to stdout-file - useful for scripting errors cat $0 # load the modules required for you program - customise for your program module purge module add foss/2018a h5py/2.7.1-Python-2.7.14 # run the program # customise for your program name and add arguments if required mpirun -n 96 python /mxn/nanomax/sw/CDIsuite/XRFCDImapping.py --path=/data/nanomax/prn20161125/ --file=GIA_sxw.h5 --scan=12 --scratch=$TMPDIR
Get statistics on completed jobs
Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc. See below for two examples,
To get statistics on completed jobs by jobID
sacct -j jobid --format=JobID,JobName,MaxRSS,Elapsed # To view the same information for all jobs of a user sacct -u usrnam --format=JobID,JobName,MaxRSS,Elapsed