6 Running Executables on HPC
Ok, we’ve gotten comfortable navigating around the HPC filesystem. Now how do we run executables on files?
Let’s talk about the two problems:
- How do we find executables on a cluster, and
- how do we load them up and run them?
6.1 A empty environment
Remember, we can poke around and see whether samtools
is installed on the machine using which
:
which samtools
If samtools
is available, it will give the path. If it isn’t, you will have an empty response.
So if we don’t have samtools
immediately available, how do we find it on our system? We can use environment modules to load software.
6.2 Environment Modules
Before you install your own versions of software, it’s important to realize that this problem may be solved for you.
Your first stop should be looking for environment modules on the HPC. Not all HPCs have these, but if they have them, this should be your first stop to find executables.
lmod
is a system for loading and unloading software modules. It is usually installed on HPCs. The commands all start with module
, and there are a number of ones that are useful for you.
module avail
module load
module purge
6.3 For FH Users: Modules benefit everyone
If there is a particular bit of software that you need to run on the FH cluster that’s not there, make sure to request it from SciComp. Someone else probably needs it and so making it known so they can add it as a Environment module will help other people.
6.3.1 Tip: Load only as many modules as you need at a time
One of the big issues with bioinformatics software is that the toolchain (the software dependencies needed to run the software) can be different. So when possible, load only one or two modules at a time for each step of your analysis. When you’re done with that step, use module purge
to clear out the software environment.
6.3.2 Tip: Always Use Versioning for Environment Modules
When you list the environment modules using module avail
, you’ll see that there are versions for each of the modules. For example, SAMtools
has a version called: SAMtools/0.1.20-foss-2018b
. There usually is a default module selected (it will be specified by a (d)
), so use that version in your script.
For example, here’s a full bash script that loads up samtools
on rhino
/gizmo
:
#!/bin/bash
module load SAMtools/0.1.20-foss-2018b
samtools view -c $1 > counts.txt
module purge
6.3.3 Tip: module purge
when you’re done
When you’re done with a software module, you should use module purge
to clear out its’ software environment.
module purge
This is because there may be conflicting dependencies between modules (Python software especially). So purge
your module when you’re done and clean up your environment.
6.3.4 Full Script Example
This is an example of a full script that uses environment modules.
#!/bin/bash
# samtools_count.sh
# Usage: ./samtools_count.sh my_bam_file.sh
# Outputs: my_bam_file.bam.counts.txt
module load SAMtools/1.19.2-GCC-13.2.0 #load the module
samtools view -c $1 > $1.counts.txt #run the script
module purge #purge the module
We can use this script by making it executable, and then run it with:
./samtools_count.sh my_bam_file.sh
This will output a counts file called my_bam_file.bam.counts.txt
in our home directory.
6.4 Running Software that’s not available.
If there is not a module available for our software, then we have a few options, in terms of effort.
- Use a Docker container with
apptainer
that has our software in it - Install the binary into our
/home
directory usingconda
- Compile the executable ourselves
For more information about these different methods, please refer to this article: Why your computational environment is important.
We will get more in-depth into containers in the Interactive Shell chapter.