8  Interactively Testing Scripts in Nodes

You might wonder how to test a script will work on a worker. That’s what interactive shells are for.

With an interactive shell, we open a bash prompt on a worker directly. We can then directly test our script out on the worker and debug it.

8.1 Visual Table of Contents

flowchart TD
   A[Open Shell on Worker] --> B
   B[Test Scripts] --> F
   F[Exit Worker]

8.2 Learning Objectives

After reading this, you will be able to:

  • Open an interactive shell on a SLURM worker node using salloc or grabnode
  • Test your code on a SLURM worker
  • Exit the shell of a remote worker
  • Utilize screen to keep a shell open and running on a remote worker.

8.3 Grabbing an interactive shell on a worker

When you’re testing code that’s going to run on a worker node, you need to be aware of what the worker node sees.

It’s also important in estimating how long our tasks are going to run since we can test how long a task runs for a representative dataset.

On a SLURM system, the way to open interactive shells on a node has changed. Check your version first:

srun --version

If you’re on a version before 20.11, you can use srun -i --pty bash to open an interactive terminal on a worker:

srun -i --pty bash

If the version is past 20.11, we can open an interactive shell on a worker with salloc.

salloc bash
For FH Users: grabnode

On the FH system, we can use a command called grabnode, which will let us request a node. It will ask us for our requirements (numbers of cores, memory, etc.) for our node.

tladera2@rhino01:~$ grabnode

grabnode will then ask us for what kind of instance we want, in terms of CPUs, Memory, and GPUs. Here, I’m grabbing a node with 8 cores, 8 Gb of memory, using it for 1 day, and no GPU.

How many CPUs/cores would you like to grab on the node? [1-36] 8
How much memory (GB) would you like to grab? [160] 8
Please enter the max number of days you would like to grab this node: [1-7] 1
Do you need a GPU ? [y/N]n

You have requested 8 CPUs on this node/server for 1 days or until you type exit.

Warning: If you exit this shell before your jobs are finished, your jobs
on this node/server will be terminated. Please use sbatch for larger jobs.

Shared PI folders can be found in: /fh/fast, /fh/scratch and /fh/secure.

Requesting Queue: campus-new cores: 8 memory: 8 gpu: NONE
srun: job 40898906 queued and waiting for resources

After a little bit, you’ll arrive at a new prompt:

(base) tladera2@gizmok164:~$

If you’re doing interactive analysis that is going to span over a few days, I recommend that you use screen or tmux.

Remember hostname

When you are doing interactive analysis, it is easy to forget in which node you’re working in. Just as a quick check, I use hostname to remind myself whether I’m in rhino, gizmo, or within an apptainer container.

8.4 Testing Your Scripts

Part of the reason for interactive testing is to make sure that the worker node can “see” and access all of your resources, including files and executables.

This is especially helpful when you are testing individual steps of a WDL workflow to make sure they work.

Working Interactively

Many people will request a node and do their interactive analysis completely on that node. This can work great, but if you are requesting a huge allocation (such as a 64-core machine), you may be waiting a while to even get access to that allocation.

If you do the work of breaking up your scripts into WDL tasks, that makes utilizing the smaller spec’d machines for tasks like alignment much faster than waiting for a large allocation.

Believe me, it’s worth doing this.

8.5 exit when done

When you’re done with the worker, you can exit the shell. This will put you back into the root node.

8.6 screen for keeping your shell open

Remember the screen utility? (Section 4.6.2). You can utilize screen to keep your shell open on a remote worker. This is necessary when running interactive jobs when not connected.

8.7 What Next?

You’ve learned how to open interactive shells in a remote node. There are a number of ways to go from here:

  • Try running your script using srun
  • Try running your script in a container (Chapter 10)