Bash for HPC

Author

Ted Laderas

Published

April 22, 2024

1 Introduction

Note: This book is a remix of my previous book, “Bash for Bioinformatics”. I’ve changed the focus to be more general and applicable to high performance computing.

Bash scripting is an essential skill in bioinformatics that we often expect bioinformaticians to have automatically learned. I think that this underestimates the difficulty of learning and applying Bash scripting.

This is a book that is meant to bring you (a budding bioinformaticist) beyond the foundational shell scripting skills learned from a shell scripting course such as the Software Carpentries Shell Course.

Specifically, this book shows you a path to get started with processing data on a High Performance Computing cluster, and setting you on the road to making a reproducible workflow using WDL.

Our goal is to showcase the “glue” skills that help you do bioinformatics reproducibly on a High Performance Computing Cluster.

1.1 Why Bash?

Bash is used as the default shell for many different bioinformatics containers and applications. So writing bash scripts can help you in many different situations where you need to automate a series of steps.

1.2 Learning Objectives for this Book

After reading and doing the exercises in this book, you should be able to:

Articulate basic HPC architecture concepts and why they’re useful in your work
Utilize basic SLURM commands to understand the architecture of your HPC cluster
Apply bash scripting to your own work
Leverage bash scripting to execute jobs on HPC
Execute batch processing of multiple files in a project
Manage software dependencies reproducibly using container-based technologies such as Docker or environment modules

1.3 What is not covered

This book is not meant to be a substitute for excellent books such as Data Science on the Command Line. This book focuses on the essential Bash shell skills that will help you on HPC systems.

1.4 Notes

This is a very opinionated journey through Bash shell scripting, workflow languages, and reproduciblity. This is written from the perspective of a user, especially on HPC systems that utilize SLURM.

It is designed to build on each of the concepts in a gradual manner. Where possible, we link to the official HPC documentation.

At each step, you’ll be able to do useful things with your data. We will focus on skills and programming patterns that are useful.

1.5 Other Resources

We recommend reviewing a course such as the Software Carpentry course for Shell Scripting before getting started with this book. The Missing Semester of your CS Education is another great introduction/resource.

1.6 Contributors

TBD.

1.7 Want to be a Contributor?

This is the first draft of this book. It’s not going to be perfect, and we need help. Specifically, we need help with testing the setup and the exercises.

If you have an problem, you can file it as an issue using this link.

In your issue, please note the following:

Your Name
What your issue was
Which section, and line you found problematic or wouldn’t run

If you’re Quarto/GitHub savvy, you can fork and file a pull request for typos/edits.

Just be aware that this is not my primary job - I’ll try to be as responsive as I can.

1.8 License

Bash for HPC by Ted Laderas is licensed under a Creative Commons Attribution 4.0 International License.
Based on a work at https://github.com/laderast/bash_for_hpc.