HPC Cluster
An HPC cluster is a type of computing system that consists of multiple interconnected computers or nodes working
together to provide high-performance computing capabilities.
Component of HPC Cluster
- Cluster includes servers, networks and storage.
- Partitions: Collection of nodes
- Node (server): Contains multiple processors.
- Processor: Contains multiple cores. Eg: CPU and GPU.
- Core: Receives instructions and performs calculations.
Nodes
- Login nodes: The secure entry point to the cluster, where users can submit jobs, manage files, and interact with the cluster.
- Compute nodes: Perform the actual computation tasks requested by users.
- Storage nodes: High-capacity, high-performance storage for user data and applications that are connected to
the compute nodes via a high-speed network.
- Data transfer nodes: Dedicated to high-speed data transfer between the cluster and external networks or storage systems.
When to use the HPC Cluster
- Computation needs much more memory than the memory limit of personal computers.
- The same program needs to be run multiple times.
- The program that you use takes too long to run, but it can be run faster in parallel.
Using R in the HPC Cluster
Module
A module refers to a software package or application that can be loaded and unloaded dynamically, allowing users to easily switch between different versions of the same software or to use different software packages without conflicts.
Load R to the HPC Cluster
R is treated as a module in the HPC Cluster, you can check available versions of R using
[ID@login ~]$ module avail
Then, load ‘R’ and the ‘gcc’ compiler to the Cluster.
[ID@login ~]$ module load gcc/11.3.0
[ID@login ~]$ module load r/4.2.2
If you want the cluster to load R automatically every time you log in to the cluster, you may use
[ID@login ~]$ module initadd gcc/11.3.0
[ID@login ~]$ module initadd r/4.2.2
Install R Packages
Open R in the Cluster
Install needed packages
> install.packages("RequiredPackages")
Quit R
Submit R jobs
In the HPC cluster, users cannot interact with compute node directly.
Thus, a submission scripts are needed to schedule jobs.
A submission script can either be written on the user’s own PC and then transferred to the cluster
or written directly in the login node using editors such as Vim or Nano.
An Example of Submission Script
The common job scheduler in the HPC cluster is the Slurm Workload Manager.
If you have an R script named ‘my.R’, you may write the following commonds to
a file named ‘simul.slurm’.
#!/bin/bash
#SBATCH --partition=general ## Specify the partition
#SBATCH --nodes=1 ## Number of nodes
#SBATCH --cpus-per-task=1 ## Number of CPUs requested in each node
#SBATCH --time=12:00:00
#SBATCH --mail-type=END
#SBATCH --mail-user=xxxx@xxxx.com
Rscript my.R
Submit the job
Simply run the following command on the Cluster
[NetID@cn01 ~]$ sbatch simul.slurm
Commonly Used Slurm Commonds
-
Submit jobs
- Jobs should be submitted via bash scripts.
[NetID@cn01 ~]$ sbatch <script_name>.slurm
You can read detailed documentations for sbatch
.
-
Cancel jobs
[NetID@cn01 ~]$ scancel <job_id>
- Cancel multiple jobs start with same numbers
[NetID@cn01 ~]$ squeue -u <user_id> | grep <common_names> | awk '{sprint $1}' | xrgs -n 1 scancel
You can read detailed documentations for scancel
.
-
Check current job queue
- Check all jobs by all users
- Check all jobs by a specific user
[NetID@cn01 ~]$ squeue -u <user_id>
You can read detailed documentations for squeue
.
You can read detailed documentations for scontrol
.
You can only control a running job or job ended within five miniutes.
You can read detailed documentations for sacctmgr
.
-
Check jobs’ history
- Check the details of your ended jobs
Parallel Computing
R is an interpreted language, meaning that code is executed line-by-line at runtime.
This can slow down the execution of loops compared to compiled languages like C or Fortran.
Moreover, R is a dynamically typed language, which means that the type of a variable can change during runtime.
This can cause additional overhead when looping over large data structures, as R needs to constantly check and update variable types.
Memory management in R can also contribute to slow loops, especially when dealing with large data structures
that require frequent copying and allocation of memory. Thus, parallel computing is needed when the
number of iteration in the loop is large. I will introduce how to use R to do parallel computing on a personal computer in my next blog.