ssh User.Name@apXX.uc.osg-htc.org
mkdir -p singularity/linux_r4ssRunning an array of SS3 jobs on OSG
Similar to the array_lm example, this example also sets up running an array job on OSG. As before, we will use a *.txt to indicate which directories we want to run jobs in as a part of our array.
There are a few main differences that serve to illustrate useful modifications to the workflow:
- in this case we will run Stock Synthesis (SS3) on all models in the SS3 testing suite to conduct a retrospective analysis;
- we will run our jobs within a software container;
- lastly, we will define variables to be passed between job submission and job execution scripts to ensure that the correct output is produced.
The OSG/ss3 example can be set-up either by cloning the repository git clone https://github.com/MOshima-PIFSC/NSASS-HTC-HPC-Computing.git, or stepping through the following code:
Note: throughout this tutorial we are using User.Name as a stand-in for your actual username and project_name as a stand-in for your project. In all cases, replace User.Name your actual user name and project_name with your specific project name.
1 Build software container
Software containers allow for portable, reproducible research by allowing researchers to set-up a software environment to their exact spefications and being able to run it on any Linux system. The Apptainer container system is widely used across HPC/HTC systems, and make it easy to build a container from a definition file. Running a job within a container means that you are able to replicate an identical software environment in any location with Apptainer installed, no matter the native operating system, software and installed packages. The Apptainer container can be built from any Linux machine with Apptainer installed, including the Open Science Grid (OSG) access points. Here we walk through the steps needed to build a Linux (Ubuntu 20.04) container containing Stock Synthesis (version 3.30.22.1), R (version 4.4.0) and the R packages r4ss, ss3diags, data.table, magrittr, and mvtnorm from a definition file, linux-r4ss-v4.def. In this case we will show the steps needed to build the container using the OSG access point as our Linux virtual machine (VM), though this may not be needed if working from an alternative Linux VM.
Note: you will have to change apXX to match your OSG access point (e.g., ap20 or ap21).
The first step is to log onto your OSG access point via ssh using a Terminal/PowerShell window and make a directory to build your container in this case singularity1.
Using a second Terminal/PowerShell window, navigate to the directory that you cloned the NSASS-HTC-HPC-Computing repo into and upload the definition file (linux-r4ss-v4.def) to the directory you just created on OSG.
scp apptainer/linux-r4ss-v4.def User.Name@apXX.uc.osg-htc.org:/home/User.Name/singularity/linux_r4ssBack in your first Terminal/PowerShell window manoeuvre into the directory, and build the container2. The second line of code is what builds the Singularity Image File (.sif) and takes two arguments: the name of the output .sif file and the input definition file (.def).
cd singularity/linux_r4ss
apptainer build linux-r4ss-v4.sif linux-r4ss-v4.defMove the built .sif file to Open Science Data Federation (OSDF) so that it can be cached and made available to any availble HTCondor jobs. Note that if you update the build of the .sif file in any way you should also rename it (e.g., -v5) as caching may not update the file if it has the same name.
mkdir -p /ospool/apXX/data/User.Name/singularity
cp linux-r4ss-v4.sif /ospool/apXX/data/User.Name/singularity2 Setup data inputs and directories
Given that our example is to run a 4-year retrospective analysis for each of the SS3 test models, the next step is downloading the SS3 test models from the nmfs-stock-synthesis/test-models Github repo. Once you’ve downloaded the test models, copy the models/ directory into a new example directory ss3/inputs/ within the NSASS-HTC-HPC-Computing/examples/OSG/ directory on your machine. If you cloned the NSASS-HTC-HPC-Computing repo, the SS3 test models will already be in the correct location.
For the sake of example the job array will be set-up to run each retrospective peel (e.g., -0 years, -1 year, … , -4 years of data) as individual jobs in the job array. We will store the results of each retrospective peel in its own directory. The directories on OSG will be listed in a text file, and we will use this text file to launch jobs on OSG (as a part of the job array) in each of the named directories.
Let us define that text file using R.
- Define a relative path, we are starting from the root directory of this project.
Show code used to define relative paths.
proj_dir = this.path::this.proj()- Write a text file containing the path names for where the directories will be on OSG relative to the examples/OSG/ss3/ folder.
Show code used to define job directory structure.
test_models=list.dirs(paste0(proj_dir,"/examples/OSG/ss3/inputs/models/"),recursive=FALSE,full.names=FALSE)
retro_peels=0:4
# replace '-' with '_' in model names since we will use '-' as a delimiter
if(length(grep("-",test_models,fixed=TRUE))>0){
test_models_new = gsub("-","_",test_models)
rename_models_idx = grep("-",test_models,fixed=TRUE)
for(i in seq_along(rename_models_idx)){
# create new dir
dir.create(paste0(proj_dir,"/examples/OSG/ss3/inputs/models/",test_models_new[rename_models_idx[i]]),recursive=TRUE)
# copy files
file.copy(paste0(proj_dir,"/examples/OSG/ss3/inputs/models/",test_models[rename_models_idx[i]],"/",list.files(paste0(proj_dir,"/examples/OSG/ss3/inputs/models/",test_models[rename_models_idx[i]]),full.names=FALSE,recursive=FALSE)),paste0(proj_dir,"/examples/OSG/ss3/inputs/models/",test_models_new[rename_models_idx[i]]))
# delete old dir
shell(paste0("powershell rm -r ",proj_dir,"/examples/OSG/ss3/inputs/models/",test_models[rename_models_idx[i]],"/"))
}
test_models = test_models_new
}
# define scenarios
scenario_df = expand.grid(model=test_models,peel=retro_peels)
scenario_df$run_id = 1:nrow(scenario_df)
scenario_df = scenario_df[,c(3,1,2)]
scenario_df$run_id = ifelse(scenario_df$run_id<10,paste0(0,scenario_df$run_id),as.character(scenario_df$run_id))
# write text file
# define paths relative to the examples/OSG/ss3/ folder
osg_dir_lines = paste0("output/", apply(scenario_df,1,paste0,collapse="-"), "/")
writeLines(osg_dir_lines, con=paste0(proj_dir, "/examples/OSG/ss3/scripts/osg_job_directories.txt"))3 Prepare job scripts
In order to execute the HTC workflow, instructions are coordinated using four scripts:
osg-prep.sh: This script prepares files for HTCondor job execution, and makes the directory structure specified byosg_job_directories.txt.osg.sub: This is the HTCondor job submission script that specifies job requirements and input/output files.osg-wrapper-r.sh: This wrapper script controls file input/output to and from the R scriptosg-ss3-example-calcs.r, executes the R script, conducts job timing and tidies up the job working directory.osg-ss3-example-calcs.r: This is the actual computation script which modifies the SS3 input files as needed, executes the appropriate SS3 model run and conducts any needed post-processing of the output within R.
In osg.sub you will need to change the following before you upload and run the script:
- Line 15: change access point
apXXandUser.Nameto match your user name and access point. - Line 27: change
project_nameto match your project on OSG.
- From within R, compress the
OSG/ss3/inputs/andOSG/ss3/scripts/directories as a tar.gz fileupload.example-ss3.tar.gz. This simplifies the number of steps needed for file transfers.
shell(paste0("powershell cd ", file.path(proj_dir, "examples", "OSG", "ss3","inputs"), ";tar -czf upload.example-ss3-models.tar.gz models/"))
shell(paste0("powershell cd ", file.path(proj_dir, "examples", "OSG", "ss3"), ";tar -czf upload.example-ss3-scripts.tar.gz scripts/"))
shell(paste0("powershell cd ", file.path(proj_dir, "examples", "OSG", "ss3"), ";tar -czf upload.example-ss3.tar.gz inputs/upload.example-ss3-models.tar.gz upload.example-ss3-scripts.tar.gz"))4 OSG workflow
- Connect to OSG
As mentioned, access to OSG and file transfer is done using a pair of Terminal/PowerShell windows, we will call them Terminal A and Terminal B. In Terminal A, log onto your access point and create a directory for this example.
ssh User.Name@ap21.uc.osg-htc.org
mkdir -p examples/OSG/ss3/- Transfer files
Open a second PowerShell terminal in the NSASS-HTC-HPC-Computing directory on your machine. This will be your local workstation, call it Terminal B. Use this terminal window to upload via scp the needed files (examples/OSG/ss3/upload.example-ss3.tar.gz) to OSG. The upload.example-ss3.tar.gz will be uploaded to the directory you just created examples/ss3.
scp examples/OSG/ss3/upload.example-ss3.tar.gz User.Name@apXX.uc.osg-htc.org:/home/User.Name/examples/OSG/ss3/- Prepare files and submit job on Hera
In Terminal A, un-tar files, change the permissions/line endings for prep scripts, execute the script and launch the condor job.
cd examples/OSG/ss3/
tar -xzf upload.example-ss3.tar.gz
tar -xzf upload.example-ss3-scripts.tar.gz
chmod 777 scripts/osg-prep.sh
dos2unix scripts/osg-prep.sh
./scripts/osg-prep.sh
condor_submit scripts/osg.sub- Download jobs
Once all jobs are completed (or the job has hit its time limit), use your Terminal B to download your jobs.
scp -r User.Name@apXX.uc.osg-htc.org:/home/User.Name/examples/OSG/ss3/output examples/OSG/ss3/5 Process results
After results are downloaded they can be processed in R to extract the model run times, time series of estimated biomass for each model run, and Mohn’s rho across retrospective peels for a given model ‘family’.
Show output processing code
# iterate over output files and extract quantities
library(data.table)
library(magrittr)
library(r4ss)
output_dirs = list.dirs(paste0(proj_dir,"/examples/OSG/ss3/output/"),recursive=FALSE,full.names=FALSE)
ssb_dt.list = comptime_dt.list = as.list(rep(NA,length(output_dirs)))
ss_output_list = as.list(rep(NA,length(output_dirs)))
names(ss_output_list) = output_dirs
for(i in seq_along(output_dirs)){
tmp_model = strsplit(output_dirs[i],"-")[[1]][2]
tmp_peel = as.numeric(strsplit(output_dirs[i],"-")[[1]][3])
tmp_index = as.numeric(strsplit(output_dirs[i],"-")[[1]][1])
# check if the End.tar.gz file got created
if(file.exists(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/End.tar.gz")))
{
# get snapshot of original files in the directory
tmp_orig_files = list.files(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/"))
# un-tar if the End.tar.gz file gets made
shell(paste0("powershell cd ", paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/"), ";tar -xzf End.tar.gz"))
# check if runtime.txt was produced and extract output
if(file.exists(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/runtime.txt"))){
tmp_time = readLines(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/runtime.txt")) %>%
gsub(".*?([0-9]+).*", "\\1", .) %>%
as.numeric(.) %>%
as.data.table(.) %>%
setnames(.,".","time")
comptime_dt.list[[i]] = data.table(id = output_dirs[i])
comptime_dt.list[[i]]$index = tmp_index
comptime_dt.list[[i]]$model = tmp_model
comptime_dt.list[[i]]$peel = tmp_peel
comptime_dt.list[[i]]$OSG_start = as.POSIXct(tmp_time$time[1],origin="1970-01-01")
comptime_dt.list[[i]]$OSG_end = as.POSIXct(tmp_time$time[2],origin="1970-01-01")
comptime_dt.list[[i]]$OSG_runtime = tmp_time$time[3]/60
# clean-up
rm(list=c("tmp_time"))
}
# if "ss_report.RData" is produced put it into the storage list
if(file.exists(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/ss_report.RData"))){
load(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/ss_report.RData"))
ss_output_list[[i]] = ss_report
ssb_dt.list[[i]] = ss_report$derived_quants %>%
as.data.table(.) %>%
.[Label %in% paste0("SSB_", ss_report$startyr:ss_report$endyr)] %>%
.[,id := output_dirs[i]] %>%
.[,sbo:=Value/subset(ss_report$derived_quants,Label=="SSB_Virgin")$Value] %>%
.[,yr:=sapply(Label,function(x)as.numeric(strsplit(x,"_")[[1]][2]))] %>%
.[,.(id,yr,sbo)]
# clean-up
rm(list=c("ss_report"))
}
# clean-up
file.remove(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/",setdiff(list.files(paste0(proj_dir,"/examples/OSG/ss3/output/",output_dirs[i],"/")),tmp_orig_files)))
rm(list=c("tmp_orig_files"))
} else {
comptime_dt.list[[i]] = data.table(id=output_dirs[i],index=tmp_index,model=tmp_model,peel=tmp_peel,OSG_start=NA,OSG_end=NA,OSG_runtime=NA)
ssb_dt.list[[i]] = data.table(id=output_dirs[i],yr=2023,sbo=NA)
}
# clean-up
rm(list=c("tmp_model","tmp_peel","tmp_index"))
}
comptime_dt = rbindlist(na.omit(comptime_dt.list))
ssb_dt = rbindlist(ssb_dt.list) %>% merge(comptime_dt[,.(id,index,model,peel)],.,by="id")
ss_output_list = na.omit(ss_output_list)
# save
fwrite(comptime_dt,file=paste0(proj_dir,"/examples/OSG/ss3/output/comptime_dt.csv"))
fwrite(ssb_dt,file=paste0(proj_dir,"/examples/OSG/ss3/output/ssb_dt.csv"))
# calculate Mohn's rho
unique_models = unique(comptime_dt$model)
retro_dt.list = as.list(rep(NA,length(unique_models)))
for(i in seq_along(unique_models)){
tmp_model = unique_models[i]
retro_dt.list[[i]] = data.table(model=tmp_model)
retro_dt.list[[i]]$type = c("SBO")
retro_dt.list[[i]]$rho = NA
if(uniqueN(na.omit(ssb_dt[model==tmp_model])$peel)==5){
tmp_dt = ssb_dt[model==tmp_model]
base_dt = tmp_dt[peel==0]
year_vec = max(base_dt$yr) - 1:4
bias_vec = rep(NA,length(year_vec))
# calc Mohn's rho for runs where all models completed
for(j in 1:4){
bias_vec[j] = (ssb_dt[model==tmp_model&peel==j&yr==year_vec[j]]$sbo - base_dt[yr==year_vec[j]]$sbo)/base_dt[yr==year_vec[j]]$sbo
}
retro_dt.list[[i]]$rho = mean(bias_vec)
rm(list=c("tmp_dt","base_dt","year_vec","bias_vec"))
}
rm(list=c("tmp_model"))
}
retro_dt = rbindlist(retro_dt.list)
fwrite(retro_dt,file=paste0(proj_dir,"/examples/OSG/ss3/output/retro_dt.csv"))5.1 Job runtime
The 90 jobs run on OSG completed 4.14 hours of calculations (2.76 minutes per job) in an elapsed time of \(\sim\) 34 minutes (Figure 1).
Show plotting code
library(ggplot2)
p = comptime_dt %>%
.[,.(id,OSG_start,OSG_end)] %>%
melt(.,id.vars="id") %>%
.[,variable:=ifelse(variable%in%c("OSG_start"),"start","end")] %>%
dcast(.,id~variable) %>%
.[order(start)] %>%
ggplot() +
xlab("Time (GMT)") +
ylab("Job") +
geom_segment(aes(x=start,xend=end,y=id,yend=id),color="#003087",alpha=0.5,linewidth=2) +
theme(panel.background = element_rect(fill = "transparent", color = "black", linetype = "solid"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background =element_rect(fill="transparent"),
legend.key = element_rect(fill = "transparent"),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
p
# save plot
ggsave(
"OSG-ss3-elapsed.png",
plot = p,
device = "png",
path = paste0(proj_dir,"/assets/static/"),
width = 8,
height = 4.5,
units = c("in"),
dpi = 300,
bg = "transparent")6 Example results
6.1 Retrospectives
Retrospective plots of static biomass depletion for the SS3 test models are shown in Figure 2.
Show plotting code
text_dt.list = as.list(rep(NA,uniqueN(ssb_dt$model)))
for(i in seq_along(text_dt.list)){
tmp_dt = ssb_dt[model==unique(ssb_dt$model)[i]]
tmp_min_yr = min(tmp_dt$yr)
text_dt.list[[i]] = data.table(model=unique(ssb_dt$model)[i],yr=tmp_min_yr,sbo=0.2,rho=round(retro_dt[model==unique(ssb_dt$model)[i]]$rho,digits=2))
}
text_dt = rbindlist(text_dt.list)
p = ssb_dt %>%
ggplot() +
facet_wrap(~model,scales="free_x") +
xlab("Year") +
ylab(expression(SB/SB[0])) +
ylim(0,NA) +
geom_hline(yintercept=0) +
geom_path(aes(x=yr,y=sbo,color=as.character(peel),group=id)) +
geom_text(data=text_dt,aes(x=yr,y=sbo,label=rho),size=3,hjust = 0) +
viridis::scale_color_viridis("Peel",begin = 0.1,end = 0.8,direction = 1,option = "H",discrete=TRUE) +
viridis::scale_fill_viridis("Peel",begin = 0.1,end = 0.8,direction = 1,option = "H",discrete=TRUE) +
theme(panel.background = element_rect(fill = "transparent", color = "black", linetype = "solid"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background =element_rect(fill="transparent"),
legend.key = element_rect(fill = "transparent"))
p
# save plot
ggsave(
"OSG-ss3-retro.png",
plot = p,
device = "png",
path = paste0(proj_dir,"/assets/static/"),
width = 8,
height = 4.5,
units = c("in"),
dpi = 300,
bg = "transparent")