Submitting an array job with Hera

Authors

Nicholas Ducharme-Barth

Megumi Oshima

Last updated

2024-11-06

In this example we step through submitting an array job on Hera where we want to run the same job in a number of directories. In this case the job is running a simple R script that reads in the data.csv file stored in the directory, fits a linear model, and writes the parameter estimates to a par.csv. We specify which directories we want to run jobs in as a part of the job-array using a text file to specify the directory path names on Hera.

The hera/array_lm example can be set-up either by cloning the repository git clone https://github.com/MOshima-PIFSC/NSASS-HTC-HPC-Computing.git, or stepping through the following code:

Coding alert!

Note: throughout this tutorial we are using User.Name as a stand-in for your actual username and NMFS/project_name as a stand-in for your project. In all cases, replace User.Name with your actual user name and NMFS/project_name with your specific project name.

1 Setup data inputs and directories

Define a relative path, we are starting from the root directory of this project.

proj_dir = this.path::this.proj()
hera_project = "NMFS/project_name/User.Name/"

Define directory names for each run.

dir_name = paste0("rep_0", 0:9)

Iterate across directories, create them, and then write a simple .csv file into them containing data to fit a linear model.

for(i in seq_along(dir_name)){
    
    if(!file.exists(file.path(proj_dir, "example", "hera", "array_lm", "inputs", dir_name[i], "data.csv")))
    {
        set.seed(i)
        dir.create(file.path(proj_dir, "examples", "hera", "array_lm", "inputs", dir_name[i]), recursive=TRUE)
        tmp = data.frame(x=1:1000)
        tmp$y = (i + (0.5*i)*tmp$x) + rnorm(1000,0,i)
        write.csv(tmp, file = file.path(proj_dir, "examples", "hera", "array_lm", "inputs", dir_name[i], "data.csv"))
    }
}

Write an R script to read in the data, run a linear model, and report back the estimated parameters.

if(!file.exists(file.path(proj_dir, "examples", "hera", "array_lm", "inputs", "run_lm_hera_array.r"))){
    script_lines = c("tmp=read.csv('data.csv')", 
    "fit=lm(y~x,data=tmp)", 
    "out = data.frame(par=unname(fit$coefficients))", 
    "write.csv(out,file='par.csv')"
    )
    writeLines(script_lines, con = file.path(proj_dir, "examples", "hera", "array_lm", "inputs", "run_lm_hera_array.r"))
}

Write a text file containing the full path names for where the directories will be on Hera.

if(!file.exists(file.path(proj_dir, "examples", "hera", "array_lm", "inputs", "hera_job_directories.txt"))){
    dir_lines = paste0("/scratch1/", hera_project, "array_lm/inputs/", dir_name, "/")
    writeLines(dir_lines, con = file.path(proj_dir, "examples", "hera", "array_lm", "inputs", "hera_job_directories.txt"))
}

Compress files and prepare for transfer
Compress the array_lm/inputs directory as a tar.gz file upload.array_lm.tar.gz. This simplifies the number of steps needed for file transfers.

We will also need to transfer the submission script, but before that, you will need to specify a few things in the script.

Coding alert!

In submit_array_lm.sh you will need to change the following before you upload and run the script:

Line 5: change account project_name to the name of your project. If you are using the NOAA htc4sa project, it would be htc4sa.
Line 9: /scratch1/NMFS/project_name/User.Name/logs/ to the location you created for logs/ above.
Line 10: Change to your email address.
Line 24: Make sure that this line points to the location on Hera that you uploaded hera_job_directories.txt to. hera_job_directories.txt is located in the array_lm/inputs directory that was uploaded as a part of upload.array_lm.tar.gz.
Line 37: Make sure that this line points to the location on Hera that you uploaded run_lm_hera_array.r to. run_lm_hera_array.r is located in the array_lm/inputs directory that was uploaded as a part of upload.array_lm.tar.gz.

system(paste0("powershell cd ", file.path(proj_dir, "examples", "hera", "array_lm"), ";tar -czf upload.array_lm.tar.gz inputs "))

2 Hera workflow

Connect to Hera

Open a PowerShell terminal and connect to Hera. This terminal will be your remote workstation, call it Terminal A. You will be prompted for your RSA passcode, which is your password followed by the 8-digit code from the authenticator app.

ssh -m hmac-sha2-256-etm@openssh.com User.Name@hera-rsa.boulder.rdhpcs.noaa.gov -p22

Create directories

In Terminal A navigate to the project directory on scratch1 and create some directories. If using a shared directory such as htc4sa/, make a directory to save your work within this directory (.e.g., User.Name/). Change your working directory to this directory. We will upload our SLURM submit script into submit_scripts/ and write our SLURM log files to logs/.

# navigate to project directory
cd /scratch1/NMFS/project_name/
# create new directory
mkdir User.Name/
# navigate into new directory
cd User.Name/
# create directory for SLURM scripts and logs
mkdir submit_scripts/
mkdir logs/

Transfer files

Open a second PowerShell terminal in the NSASS-HTC-HPC-Computing directory on your machine. This will be your local workstation, call it Terminal B. Use this terminal window to upload via scp the needed files (examples/hera/array_lm/upload.array_lm.tar.gz and examples/hera/array_lm/slurm_scripts/submit_array_lm.sh) to Hera. The upload.array_lm.tar.gz will be uploaded to your directory within the project directory on scratch1 and the submit script submit_array_lm.sh will be uploaded to the submit_scripts directory. Make sure your VPN is active when attempting to upload using the DTN. You will be prompted for your RSA passcode after each scp command.

# upload inputs
scp -o MACs=hmac-sha2-512 examples/hera/array_lm/upload.array_lm.tar.gz User.Name@dtn-hera.fairmont.rdhpcs.noaa.gov:/scratch1/NMFS/project_name/User.Name

# upload submission script 
scp -o MACs=hmac-sha2-512 examples/hera/array_lm/slurm_scripts/submit_array_lm.sh User.Name@dtn-hera.fairmont.rdhpcs.noaa.gov:/scratch1/NMFS/project_name/User.Name/submit_scripts

Troubleshooting Tip

If you are getting an error Corrupted MAC on input when uploading files, add -o MACs=hmac-sha2-512 between scp and the name of the file to upload.

Un-tar files

Back in Terminal A untar the inputs directory.

# untar files
tar -xzf upload.array_lm.tar.gz

Prep files to be read on Hera

Make sure files that will be read/executed have unix line endings:

dos2unix inputs/run_lm_hera_array.r inputs/hera_job_directories.txt submit_scripts/submit_array_lm.sh

Submit job

Now you are ready to submit the SLURM submission script submit_array_lm.sh.

sbatch submit_scripts/submit_array_lm.sh

As part of this job script, it cleans the directories specified in hera_job_directories.txt of the input file data.csv. The output is a compressed tar.gz containing the output produced by run_lm_hera_array.r (par.csv) and runtime.txt which logs the job start, end, and runtime. You can check your job status using squeue but this job should complete in a few seconds.

squeue -u User.Name

Download results

Before sending back the results you need to compress all of the outputs (stored in the inputs directory).

cd inputs
tar -czf download.array_lm.tar.gz ./*

Moving back to Terminal B you can download the results, but first you create a directory for it to be downloaded into. This can be done in R:

dir.create(file.path(proj_dir, "examples", "hera", "array_lm", "output"), recursive=TRUE, showWarnings = FALSE)

Now you can use scp in Terminal B to download download.array_lm.tar.gz into examples/hera/array_lm/output

scp -o MACs=hmac-sha2-512 User.Name@dtn-hera.fairmont.rdhpcs.noaa.gov:/scratch1/NMFS/project_name/User.Name/inputs/download.array_lm.tar.gz examples/hera/array_lm/output/

Move into examples/hera/array_lm/output/ and untar downloaded results.

# navigate to output directory
cd examples/hera/array_lm/output/
# untar results
tar -xzf download.array_lm.tar.gz

3 Process the output

In R, iterate through the sub-directories of the input and output data to extract the results of the linear model fits, and the model run time information.

Show code

library(data.table)
library(magrittr)

input_data.list = as.list(rep(NA,10))
output_data.list = as.list(rep(NA,10))
runtime_data.list = as.list(rep(NA,10))

##TODO: change paths here to match directory names
for(i in seq_along(dir_name)){
    # get input data
        input_data.list[[i]] = fread(file.path(proj_dir, "examples", "hera", "array_lm", "inputs", dir_name[i],"data.csv")) %>%
            .[,.(x,y)] %>%
            .[,model := factor(as.character(i),levels=as.character(1:10))] %>%
            .[,.(model,x,y)]
    
    # untar results
        system(paste0("powershell cd ", file.path(proj_dir, "examples", "hera", "array_lm", "output", dir_name[i],"/"), ";tar -xzf output.tar.gz"))

    # get output
        output_data.list[[i]] = fread(file.path(proj_dir, "examples", "hera", "array_lm", "output", dir_name[i],"par.csv")) %>%
            .[,.(par)] %>%
            .[,model := factor(as.character(i),levels=as.character(1:10))] %>%
            .[,.(model,par)] %>%
            melt(.,id.vars="model") %>%
            .[,variable:=c("intercept","slope")] %>%
            dcast(.,model ~ variable) %>%
            merge(.,input_data.list[[i]][,.(model,x)],by="model") %>%
            .[,pred_y := intercept+slope*x] %>%
            .[,.(model,x,pred_y)]
    # get time
        runtime_data.list[[i]] = readLines(file.path(proj_dir, "examples", "hera", "array_lm", "output", dir_name[i],"runtime.txt")) %>%
            gsub(".*?([0-9]+).*", "\\1", .) %>%
            as.numeric(.) %>%
            as.data.table(.) %>%
            setnames(.,".","time") %>%
            .[,model := factor(as.character(i),levels=as.character(1:10))] %>%
            melt(.,id.vars="model") %>%
            .[,variable:=c("start","end","runtime")] %>%
            dcast(.,model ~ variable) %>%
            .[,.(model,start,end,runtime)]
}

input_data = rbindlist(input_data.list)
output_data = rbindlist(output_data.list)
runtime_data = rbindlist(runtime_data.list)

The jobs started execution at 2023-05-31 00:49:56 and all finished by 2023-05-31 00:49:57 for an elapsed runtime of 1 seconds and a total computation time of 5 seconds. Use of Hera resulted in a job completing 5\(\times\) faster. Figure 1 shows the simulated data and estimated linear fits for each model run in the job-array.

Show code

library(ggplot2)
input_data %>%
ggplot() +
geom_point(aes(x=x,y=y,fill=model),alpha=0.05,size=5,shape=21) +
geom_line(data=output_data,aes(x=x,y=pred_y,color=model),linewidth=2)