6  Running Executables on HPC

Ok, we’ve gotten comfortable navigating around the HPC filesystem. Now how do we run executables on files?

Let’s talk about the two problems:

  1. How do we find executables on a cluster, and
  2. how do we load them up and run them?

6.1 A empty environment

Remember, we can poke around and see whether samtools is installed on the machine using which:

which samtools

If samtools is available, it will give the path. If it isn’t, you will have an empty response.

So if we don’t have samtools immediately available, how do we find it on our system? We can use environment modules to load software.

6.2 Environment Modules

Before you install your own versions of software, it’s important to realize that this problem may be solved for you.

Your first stop should be looking for environment modules on the HPC. Not all HPCs have these, but if they have them, this should be your first stop to find executables.

lmod is a system for loading and unloading software modules. It is usually installed on HPCs. The commands all start with module, and there are a number of ones that are useful for you.

  • module avail
  • module load
  • module purge

6.3 For FH Users: Modules benefit everyone

If there is a particular bit of software that you need to run on the FH cluster that’s not there, make sure to request it from SciComp. Someone else probably needs it and so making it known so they can add it as a Environment module will help other people.

For FH Users

On the FH cluster, ml is a handy command that combines module load and module avail.

You can load a module with ml <module_name>.

6.3.1 Tip: Load only as many modules as you need at a time

One of the big issues with bioinformatics software is that the toolchain (the software dependencies needed to run the software) can be different. So when possible, load only one or two modules at a time for each step of your analysis. When you’re done with that step, use module purge to clear out the software environment.

6.3.2 Tip: Always Use Versioning for Environment Modules

When you list the environment modules using module avail, you’ll see that there are versions for each of the modules. For example, SAMtools has a version called: SAMtools/0.1.20-foss-2018b. There usually is a default module selected (it will be specified by a (d)), so use that version in your script.

For example, here’s a full bash script that loads up samtools on rhino/gizmo:

#!/bin/bash
module load SAMtools/0.1.20-foss-2018b
samtools view -c $1 > counts.txt
module purge

6.3.3 Tip: module purge when you’re done

When you’re done with a software module, you should use module purge to clear out its’ software environment.

module purge

This is because there may be conflicting dependencies between modules (Python software especially). So purge your module when you’re done and clean up your environment.

6.3.4 Full Script Example

This is an example of a full script that uses environment modules.

#!/bin/bash
# samtools_count.sh
# Usage: ./samtools_count.sh my_bam_file.sh
# Outputs: my_bam_file.bam.counts.txt
module load SAMtools/1.19.2-GCC-13.2.0  #load the module
samtools view -c $1 > $1.counts.txt     #run the script 
module purge                            #purge the module

We can use this script by making it executable, and then run it with:

./samtools_count.sh my_bam_file.sh

This will output a counts file called my_bam_file.bam.counts.txt in our home directory.

6.4 Running Software that’s not available.

If there is not a module available for our software, then we have a few options, in terms of effort.

  1. Use a Docker container with apptainer that has our software in it
  2. Install the binary into our /home directory using conda
  3. Compile the executable ourselves

For more information about these different methods, please refer to this article: Why your computational environment is important.

We will get more in-depth into containers in the Interactive Shell chapter.