1 Workshop Expectations

We want to foster a positive learning environment in this workshop. We expect everyone to adhere to the Code of Conduct. We will enforce this and ask you to leave if you are not respectful to others.
In short:

  • be respectful of each other’s learning styles
  • don’t be dismissive or mean to someone who knows less than you
  • try to help people if you see them struggle and you can help.

Additionally, please work together! The people in my workshops who have a really bad time and don’t get anything out of it are the ones who try to do it alone. To quote Legend of Zelda, “It’s dangerous to go alone”. Talking through the material with someone will help you understand it.

We also are giving people post-it notes. Put them on your laptops so we can identify you if you need help or not. Green means “I’m okay, don’t bug me”, Red means “I need some help!”.

2 Prerequisites for this training:

You will need the following:

  1. ACC Login
  2. Permission to join exacloud
  3. SSH terminal program (such as Terminal (Mac/Linux) or PuTTY(Windows))
  4. Optional: an FTP program (WinSCP/Cyberduck) for transferring files

You should be able to understand the following:

  1. What a shell script is
    • How to run a shell script
    • What a shebang is and setting up your shell script
  2. Basic shell commands, including:
    • directory and file manipulation: ls, rm, mkdir, rmdir
    • file editing using nano, vim, or emacs.
    • how to set basic file permissions
    • what process monitoring is: ps and kill
  3. How to run your program/workflow on the command line
    • R: Rscript
    • Python: python
    • Executable: GATK
  4. Know how to find if your executable is already on exacloud
    • which and where
    • some possible locations
    • /opt/installed/ – various package-specific bin/ and sbin/ subdirectories
    • /opt/rh – see the scl(1) man page and/or https://accdoc.ohsu.edu/main/howto/scl-intro/
    • /usr/local – not just the bin/ and sbin/ subdirs but also, e.g., /usr/local/cuda/bin
    • Your lab’s space in Lustre or on RDS

If you are unsure what any of these are or need a refresher, here is a DataCamp course: Intro to Shell For Data Science. I recommend reviewing Chapters 1, 2, and 4.

2.1 What is Exacloud?

Exacloud is OHSU’s cluster computing environment. It’s available for those people who have an ACC account and who request access. It exists to do many kinds of compute jobs:

  • Running genomic aligners, such as Genome Analysis Toolkit (GATK)
  • Running complicated analysis jobs in R
  • General python data processing (numpy, pandas, text analysis)
  • image analysis pipelines

The initial hardware was donated by Intel, but it also funded by the OHSU Knight Cancer Institute, Howard Hughes Medical Institute (HHMI), and the OHSU Center for Spatial Systems Biomedicine (OCSSB). exacloud is maintained by the Advanced Computing Center (ACC).

To run jobs effectively on exacloud, you must understand some basic shell scripting techniques and how exacloud is organized. That’s the goal of this training workshop.

2.2 Architecture of exacloud

Exacloud Architecture This is just a small subset of nodes available.

Exacloud Architecture This is just a small subset of nodes available.

Exacloud is organized by what are called nodes. Each node can be thought of as a multi-core or multi-CPU unit with usually 24 cores that share memory together. There are 250 nodes, which means that exacloud has has over 6000 CPUs of varying capabilities and memory that can be utilized by users who want to run parallel computing jobs.

There are two different types of nodes on exacloud: the first type is the head node, which is the node that you sign into initially into exacloud (this is usually exahead1.ohsu.edu). The head node handles all of the scheduling and file transfer to the other nodes in the system.

The other nodes are known as subservient compute nodes. They don’t do anything unless the head node assigns them a task. Note that different nodes may have different configurations; they may have graphical processing units (GPUs) for computation, they may have less memory, or they might have special software installed on them (such as Docker).

How does the head node tell the compute nodes what to do? There is a program called Slurm (Simple Linux Utility for Resource Management) installed on the head node that schedules computing jobs for all of the compute nodes on exacloud. How does it decide which jobs to run on which nodes? Part of it has to do with compute requirements - each individual job requires a certain amount of CPU power and Memory. You will request this when you run the job. Your request for allocation is balanced with the current load (i.e. other jobs currently running) on the cluster.

A note: slurm is different than the previous job scheduler on exacloud: HTCondor. Slurm does less than HTCondor: it really only lets you request allocations with particular memory/CPU requirements. It doesn’t try to do more than that, unlike HTCondor, which provided a simple way to batch process files. This means that the majority of your scripts for running jobs will use bash scripting to get things done.

2.3 Useful reference material

Here’s a list of terminology and a list of useful slurm commands. If you get confused by any terms, please refer to it.

2.4 How do I access exacloud?

Most of the time, you will sign in to exahead1. You will need a terminal program (Mac/Linux has terminal built in, Windows will need to download something like PuTTY). There are two head nodes on exacloud:

  • exahead1.ohsu.edu (SSH in) - only accessible within OHSU Firewall
  • exahead2.ohsu.edu (SSH in) - only accessible within OHSU Firewall

File transfering via SFTP (using something like WinSCP (PC) or Cyberduck (Mac)) can be done at both exahead1.ohsu.edu and exahead2.ohsu.edu.

All of you should have external access to exacloud, which means you should be able to access it from off-campus. How to access it from off-campus? You will need to go through the ACC gateway first, and then ssh into exahead1 or exahead2.

  • acc.ohsu.edu : SSH gateway
  • acc.ohsu.edu: SFTP gateway (limited to 10 GB, and whatever is linked from RDS)

2.5 Filesystems 1: Your ACC directory

For reference: The full list of file systems and storage available on exacloud.

When you sign into exahead1, you will sign in to your ACC home directory. This directory is the directory that is accessible to all machines managed within ACC (for example, state.ohsu.edu, the DMICE server). I usually use my ACC directory for things like file libraries, my scripts, and such. Today we’ll also use it for our data.

Note there’s a limit on file storage for your acc directory (10 GB), so if you need more room for your data, you will need to put it on lustre or the other options.

2.6 Filesystems 2: Lustre

Lustre is the shared file system on exacloud. It is a distributed file system, that means that all nodes can easily access it. It is meant to be a temporary home for data and files that need to be processed by exacloud.

We will not be using lustre for our workshop today. If you want to put files on lustre, you will need to request access for a particular lab and group. If you are not attached to a group, you will not have access to lustre.

If you don’t need to process your data immediately, please don’t leave it on lustre, as it’s a shared resource. If you need to store your data for a longer period of time, please request Research Data Storage to store it.

2.7 Filesystems 3: Research Data Storage

What if you’re accumulating data, but don’t need to analyse it right away? Then you should put it in Research Data Storage (RDS). Yearly costs are per terabyte of data. It’s pretty reasonable, and the costs are on the ACC website.

RDS can also be used as a data backup for large datasets. Contact email:acc@ohsu.edu for more information.

3 Workshop

Now that we’ve got all that out of the way, we can actually start to do stuff on exacloud!

Note: these instructions assume you’re within the OHSU firewall. If you’re off campus, you’ll need an ACC external access account and will have to ssh USERNAME@acc.ohsu.edu before you do these steps. acc.ohsu.edu is the “gateway” to all machines hosted by ACC.

4 Task 0 - Sign in to exacloud and clone this repo

  1. To connect with exacloud, use the ssh command and input your password when prompted.
ssh USERNAME@exahead1.ohsu.edu
  1. Now that you’re signed in, clone the repository using git:
git clone https://github.com/laderast/exacloud_tutorial
  1. You should now have a directory in your home directory called exacloud_tutorial. Change into the repository directory and you should be ready to go!
cd exacloud_tutorial

5 Task 1 - Let’s look at the overall structure of exacloud