Running an array of SS3 jobs on OSG

Authors

Nicholas Ducharme-Barth

Megumi Oshima

Last updated

2024-11-06

Similar to the array_lm example, this example also sets up running an array job on OSG. As before, we will use a *.txt to indicate which directories we want to run jobs in as a part of our array.

There are a few main differences that serve to illustrate useful modifications to the workflow:

in this case we will run Stock Synthesis (SS3) on all models in the SS3 testing suite to conduct a retrospective analysis;
we will run our jobs within a software container;
lastly, we will define variables to be passed between job submission and job execution scripts to ensure that the correct output is produced.

The OSG/ss3 example can be set-up either by cloning the repository git clone https://github.com/MOshima-PIFSC/NSASS-HTC-HPC-Computing.git, or stepping through the following code:

Coding alert!

Note: throughout this tutorial we are using User.Name as a stand-in for your actual username and project_name as a stand-in for your project. In all cases, replace User.Name your actual user name and project_name with your specific project name.

1 Build software container

Software containers allow for portable, reproducible research by allowing researchers to set-up a software environment to their exact spefications and being able to run it on any Linux system. The Apptainer container system is widely used across HPC/HTC systems, and make it easy to build a container from a definition file. Running a job within a container means that you are able to replicate an identical software environment in any location with Apptainer installed, no matter the native operating system, software and installed packages. The Apptainer container can be built from any Linux machine with Apptainer installed, including the Open Science Grid (OSG) access points. Here we walk through the steps needed to build a Linux (Ubuntu 20.04) container containing Stock Synthesis (version 3.30.22.1), R (version 4.4.0) and the R packages r4ss, ss3diags, data.table, magrittr, and mvtnorm from a definition file, linux-r4ss-v4.def. In this case we will show the steps needed to build the container using the OSG access point as our Linux virtual machine (VM), though this may not be needed if working from an alternative Linux VM.

Coding alert!

Note: you will have to change apXX to match your OSG access point (e.g., ap20 or ap21).

The first step is to log onto your OSG access point via ssh using a Terminal/PowerShell window and make a directory to build your container in this case singularity¹.

ssh User.Name@apXX.uc.osg-htc.org
mkdir -p singularity/linux_r4ss

Using a second Terminal/PowerShell window, navigate to the directory that you cloned the NSASS-HTC-HPC-Computing repo into and upload the definition file (linux-r4ss-v4.def) to the directory you just created on OSG.

scp apptainer/linux-r4ss-v4.def User.Name@apXX.uc.osg-htc.org:/home/User.Name/singularity/linux_r4ss

Back in your first Terminal/PowerShell window manoeuvre into the directory, and build the container². The second line of code is what builds the Singularity Image File (.sif) and takes two arguments: the name of the output .sif file and the input definition file (.def).

cd singularity/linux_r4ss
apptainer build linux-r4ss-v4.sif linux-r4ss-v4.def

Move the built .sif file to Open Science Data Federation (OSDF) so that it can be cached and made available to any availble HTCondor jobs. Note that if you update the build of the .sif file in any way you should also rename it (e.g., -v5) as caching may not update the file if it has the same name.

mkdir -p /ospool/apXX/data/User.Name/singularity
cp linux-r4ss-v4.sif /ospool/apXX/data/User.Name/singularity

2 Setup data inputs and directories

Given that our example is to run a 4-year retrospective analysis for each of the SS3 test models, the next step is downloading the SS3 test models from the nmfs-stock-synthesis/test-models Github repo. Once you’ve downloaded the test models, copy the models/ directory into a new example directory ss3/inputs/ within the NSASS-HTC-HPC-Computing/examples/OSG/ directory on your machine. If you cloned the NSASS-HTC-HPC-Computing repo, the SS3 test models will already be in the correct location.

For the sake of example the job array will be set-up to run each retrospective peel (e.g., -0 years, -1 year, … , -4 years of data) as individual jobs in the job array. We will store the results of each retrospective peel in its own directory. The directories on OSG will be listed in a text file, and we will use this text file to launch jobs on OSG (as a part of the job array) in each of the named directories.

Let us define that text file using R.

Define a relative path, we are starting from the root directory of this project.

Show code used to define relative paths.

proj_dir = this.path::this.proj()

Write a text file containing the path names for where the directories will be on OSG relative to the examples/OSG/ss3/ folder.

Show code used to define job directory structure.

test_models=list.dirs(paste0(proj_dir,"/examples/OSG/ss3/inputs/models/"),recursive=FALSE,full.names=FALSE)
retro_peels=0:4

# replace '-' with '_' in model names since we will use '-' as a delimiter
    if(length(grep("-",test_models,fixed=TRUE))>0){
        test_models_new = gsub("-","_",test_models)
        rename_models_idx = grep("-",test_models,fixed=TRUE)
        for(i in seq_along(rename_models_idx)){
            # create new dir
            dir.create(paste0(proj_dir,"/examples/OSG/ss3/inputs/models/",test_models_new[rename_models_idx[i]]),recursive=TRUE)
            # copy files
            file.copy(paste0(proj_dir,"/examples/OSG/ss3/inputs/models/",test_models[rename_models_idx[i]],"/",list.files(paste0(proj_dir,"/examples/OSG/ss3/inputs/models/",test_models[rename_models_idx[i]]),full.names=FALSE,recursive=FALSE)),paste0(proj_dir,"/examples/OSG/ss3/inputs/models/",test_models_new[rename_models_idx[i]]))
            # delete old dir
            shell(paste0("powershell rm -r ",proj_dir,"/examples/OSG/ss3/inputs/models/",test_models[rename_models_idx[i]],"/"))
        }
        test_models = test_models_new
    }
    

# define scenarios
scenario_df = expand.grid(model=test_models,peel=retro_peels)
scenario_df$run_id = 1:nrow(scenario_df)
scenario_df = scenario_df[,c(3,1,2)]
scenario_df$run_id = ifelse(scenario_df$run_id<10,paste0(0,scenario_df$run_id),as.character(scenario_df$run_id))

# write text file
# define paths relative to the examples/OSG/ss3/ folder
osg_dir_lines = paste0("output/", apply(scenario_df,1,paste0,collapse="-"), "/")
writeLines(osg_dir_lines, con=paste0(proj_dir, "/examples/OSG/ss3/scripts/osg_job_directories.txt"))

3 Prepare job scripts

In order to execute the HTC workflow, instructions are coordinated using four scripts:

osg-prep.sh: This script prepares files for HTCondor job execution, and makes the directory structure specified by osg_job_directories.txt.
osg.sub: This is the HTCondor job submission script that specifies job requirements and input/output files.
osg-wrapper-r.sh: This wrapper script controls file input/output to and from the R script osg-ss3-example-calcs.r, executes the R script, conducts job timing and tidies up the job working directory.
osg-ss3-example-calcs.r: This is the actual computation script which modifies the SS3 input files as needed, executes the appropriate SS3 model run and conducts any needed post-processing of the output within R.

Coding alert!

In osg.sub you will need to change the following before you upload and run the script:

Line 15: change access point apXX and User.Name to match your user name and access point.
Line 27: change project_name to match your project on OSG.

From within R, compress the OSG/ss3/inputs/ and OSG/ss3/scripts/ directories as a tar.gz file upload.example-ss3.tar.gz. This simplifies the number of steps needed for file transfers.

shell(paste0("powershell cd ", file.path(proj_dir, "examples", "OSG", "ss3","inputs"), ";tar -czf upload.example-ss3-models.tar.gz models/"))
shell(paste0("powershell cd ", file.path(proj_dir, "examples", "OSG", "ss3"), ";tar -czf upload.example-ss3-scripts.tar.gz scripts/"))
shell(paste0("powershell cd ", file.path(proj_dir, "examples", "OSG", "ss3"), ";tar -czf upload.example-ss3.tar.gz inputs/upload.example-ss3-models.tar.gz upload.example-ss3-scripts.tar.gz"))

4 OSG workflow

Connect to OSG

As mentioned, access to OSG and file transfer is done using a pair of Terminal/PowerShell windows, we will call them Terminal A and Terminal B. In Terminal A, log onto your access point and create a directory for this example.

ssh User.Name@ap21.uc.osg-htc.org
mkdir -p examples/OSG/ss3/

Transfer files

Open a second PowerShell terminal in the NSASS-HTC-HPC-Computing directory on your machine. This will be your local workstation, call it Terminal B. Use this terminal window to upload via scp the needed files (examples/OSG/ss3/upload.example-ss3.tar.gz) to OSG. The upload.example-ss3.tar.gz will be uploaded to the directory you just created examples/ss3.

scp examples/OSG/ss3/upload.example-ss3.tar.gz User.Name@apXX.uc.osg-htc.org:/home/User.Name/examples/OSG/ss3/

Prepare files and submit job on Hera

In Terminal A, un-tar files, change the permissions/line endings for prep scripts, execute the script and launch the condor job.

cd examples/OSG/ss3/
tar -xzf upload.example-ss3.tar.gz
tar -xzf upload.example-ss3-scripts.tar.gz
chmod 777 scripts/osg-prep.sh
dos2unix scripts/osg-prep.sh
./scripts/osg-prep.sh
condor_submit scripts/osg.sub

Download jobs

Once all jobs are completed (or the job has hit its time limit), use your Terminal B to download your jobs.

scp -r User.Name@apXX.uc.osg-htc.org:/home/User.Name/examples/OSG/ss3/output examples/OSG/ss3/

5 Process results

After results are downloaded they can be processed in R to extract the model run times, time series of estimated biomass for each model run, and Mohn’s rho across retrospective peels for a given model ‘family’.

Show output processing code

# iterate over output files and extract quantities
library(data.table)
library(magrittr)
library(r4ss)

output_dirs = list.dirs(paste0(proj_dir,"/examples/OSG/ss3/output/"),recursive=FALSE,full.names=FALSE)
ssb_dt.list = comptime_dt.list = as.list(rep(NA,length(output_dirs)))
ss_output_list =  as.list(rep(NA,length(output_dirs)))
names(ss_output_list) = output_dirs

for(i in seq_along(output_dirs)){
    tmp_model = strsplit(output_dirs[i],"-")[[1]][2]
    tmp_peel = as.numeric(strsplit(output_dirs[i],"-")[[1]][3])
    tmp_index = as.numeric(strsplit(output_dirs[i],"-")[[1]][1])

    # check if the End.tar.gz file got created
    if(file.exists(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/End.tar.gz")))
    {
        # get snapshot of original files in the directory
        tmp_orig_files = list.files(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/"))

        # un-tar if the End.tar.gz file gets made
        shell(paste0("powershell cd ", paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/"), ";tar -xzf End.tar.gz"))

        # check if runtime.txt was produced and extract output
        if(file.exists(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/runtime.txt"))){
            tmp_time = readLines(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/runtime.txt")) %>%
            gsub(".*?([0-9]+).*", "\\1", .) %>%
            as.numeric(.) %>%
            as.data.table(.) %>%
            setnames(.,".","time")
            comptime_dt.list[[i]] = data.table(id = output_dirs[i])    
            comptime_dt.list[[i]]$index = tmp_index
            comptime_dt.list[[i]]$model = tmp_model
            comptime_dt.list[[i]]$peel = tmp_peel
            comptime_dt.list[[i]]$OSG_start = as.POSIXct(tmp_time$time[1],origin="1970-01-01")
            comptime_dt.list[[i]]$OSG_end = as.POSIXct(tmp_time$time[2],origin="1970-01-01")
            comptime_dt.list[[i]]$OSG_runtime = tmp_time$time[3]/60

            # clean-up
            rm(list=c("tmp_time"))
        }

        # if "ss_report.RData" is produced put it into the storage list
        if(file.exists(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/ss_report.RData"))){
            load(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/ss_report.RData"))
            ss_output_list[[i]] = ss_report

            ssb_dt.list[[i]] = ss_report$derived_quants %>%
                as.data.table(.) %>%
                .[Label %in% paste0("SSB_", ss_report$startyr:ss_report$endyr)] %>%
                .[,id := output_dirs[i]] %>%
                .[,sbo:=Value/subset(ss_report$derived_quants,Label=="SSB_Virgin")$Value] %>%
                .[,yr:=sapply(Label,function(x)as.numeric(strsplit(x,"_")[[1]][2]))] %>%
                .[,.(id,yr,sbo)]
            # clean-up
                rm(list=c("ss_report"))
        }    

        # clean-up
        file.remove(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/",setdiff(list.files(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/")),tmp_orig_files)))
        rm(list=c("tmp_orig_files"))
    } else {
        comptime_dt.list[[i]] = data.table(id=output_dirs[i],index=tmp_index,model=tmp_model,peel=tmp_peel,OSG_start=NA,OSG_end=NA,OSG_runtime=NA)
        ssb_dt.list[[i]] = data.table(id=output_dirs[i],yr=2023,sbo=NA)
    }

    # clean-up
    rm(list=c("tmp_model","tmp_peel","tmp_index"))
}

comptime_dt = rbindlist(na.omit(comptime_dt.list))
ssb_dt = rbindlist(ssb_dt.list) %>% merge(comptime_dt[,.(id,index,model,peel)],.,by="id")
ss_output_list = na.omit(ss_output_list)

# save
fwrite(comptime_dt,file=paste0(proj_dir,"/examples/OSG/ss3/output/comptime_dt.csv"))
fwrite(ssb_dt,file=paste0(proj_dir,"/examples/OSG/ss3/output/ssb_dt.csv"))

# calculate Mohn's rho
unique_models = unique(comptime_dt$model)
retro_dt.list = as.list(rep(NA,length(unique_models)))

for(i in seq_along(unique_models)){
    tmp_model = unique_models[i]

    retro_dt.list[[i]] = data.table(model=tmp_model)
    retro_dt.list[[i]]$type = c("SBO")
    retro_dt.list[[i]]$rho = NA

    if(uniqueN(na.omit(ssb_dt[model==tmp_model])$peel)==5){
        tmp_dt = ssb_dt[model==tmp_model]
        base_dt = tmp_dt[peel==0]
        year_vec = max(base_dt$yr) - 1:4
        bias_vec = rep(NA,length(year_vec))
        # calc Mohn's rho for runs where all models completed
        for(j in 1:4){
            bias_vec[j] = (ssb_dt[model==tmp_model&peel==j&yr==year_vec[j]]$sbo - base_dt[yr==year_vec[j]]$sbo)/base_dt[yr==year_vec[j]]$sbo
        }
        retro_dt.list[[i]]$rho = mean(bias_vec)
        rm(list=c("tmp_dt","base_dt","year_vec","bias_vec"))
    } 
    
    rm(list=c("tmp_model"))
}

retro_dt = rbindlist(retro_dt.list)
fwrite(retro_dt,file=paste0(proj_dir,"/examples/OSG/ss3/output/retro_dt.csv"))

5.1 Job runtime

The 90 jobs run on OSG completed 4.14 hours of calculations (2.76 minutes per job) in an elapsed time of \(\sim\) 34 minutes (Figure 1).

Show plotting code

library(ggplot2)

p = comptime_dt %>%
.[,.(id,OSG_start,OSG_end)] %>%
melt(.,id.vars="id") %>%
.[,variable:=ifelse(variable%in%c("OSG_start"),"start","end")] %>%
dcast(.,id~variable) %>%
.[order(start)] %>%
ggplot() +
xlab("Time (GMT)") +
ylab("Job") +
geom_segment(aes(x=start,xend=end,y=id,yend=id),color="#003087",alpha=0.5,linewidth=2) +
theme(panel.background = element_rect(fill = "transparent", color = "black", linetype = "solid"),
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),  
            strip.background =element_rect(fill="transparent"),
            legend.key = element_rect(fill = "transparent"),
            axis.text.y=element_blank(),
            axis.ticks.y=element_blank())
p

# save plot
ggsave(
  "OSG-ss3-elapsed.png",
  plot = p,
  device = "png",
  path = paste0(proj_dir,"/assets/static/"),
  width = 8,
  height = 4.5,
  units = c("in"),
  dpi = 300,
  bg = "transparent")

Each line indicates a job, with longer lines indicating longer run times. — Figure 1: Start and stop time for jobs run on OSG.

6 Example results

6.1 Retrospectives

Retrospective plots of static biomass depletion for the SS3 test models are shown in Figure 2.

Show plotting code

text_dt.list = as.list(rep(NA,uniqueN(ssb_dt$model)))
for(i in seq_along(text_dt.list)){
    tmp_dt = ssb_dt[model==unique(ssb_dt$model)[i]]
    tmp_min_yr = min(tmp_dt$yr)
    text_dt.list[[i]] = data.table(model=unique(ssb_dt$model)[i],yr=tmp_min_yr,sbo=0.2,rho=round(retro_dt[model==unique(ssb_dt$model)[i]]$rho,digits=2))
}
text_dt = rbindlist(text_dt.list)


p = ssb_dt %>%
            ggplot() +
            facet_wrap(~model,scales="free_x") +
            xlab("Year") +
            ylab(expression(SB/SB[0])) +
            ylim(0,NA) +
            geom_hline(yintercept=0) +
            geom_path(aes(x=yr,y=sbo,color=as.character(peel),group=id)) +
            geom_text(data=text_dt,aes(x=yr,y=sbo,label=rho),size=3,hjust = 0) +
            viridis::scale_color_viridis("Peel",begin = 0.1,end = 0.8,direction = 1,option = "H",discrete=TRUE) +
            viridis::scale_fill_viridis("Peel",begin = 0.1,end = 0.8,direction = 1,option = "H",discrete=TRUE) +
            theme(panel.background = element_rect(fill = "transparent", color = "black", linetype = "solid"),
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            strip.background =element_rect(fill="transparent"),
            legend.key = element_rect(fill = "transparent"))
p

# save plot
ggsave(
  "OSG-ss3-retro.png",
  plot = p,
  device = "png",
  path = paste0(proj_dir,"/assets/static/"),
  width = 8,
  height = 4.5,
  units = c("in"),
  dpi = 300,
  bg = "transparent")

Colored lines depicting the Stock Synthesis model fits to the testing model suite examples. The lines depict the depletion in spawning biomass over time. The colors correspond to the different retrospective peels that were fit in this example. The plot displays a variety of trends from these models. — Figure 2: Depletion estimates across retrospective peels from the Stock Synthesis testing model suite examples. Mohn’s rho values are printed in each panel.

Footnotes

This directory can be named anything that you like, in this case singularity is a legacy name from an earlier version of the code written before Singularity changed its name to Apptainer.↩︎
This may take ~10-15 minutes depending on how long it takes to install R packages.↩︎