= this.path::this.proj() proj_dir
Submitting an array job with OSG
In this example we step through submitting an array job on OSG where we want to run the same job in a number of directories. In this case the job is running a simple R script that reads in the data.csv file stored in the directory, fits a linear model, and writes the parameter estimates to a par.csv. We specify which directories we want to run jobs in as a part of the job-array using a text file to specify the directory path names on OSG.
The osg/array_lm example can be set-up either by cloning the repository git clone https://github.com/MOshima-PIFSC/NSASS-HTC-HPC-Computing.git
, or stepping through the following code:
Note: throughout this tutorial we are using User.Name as a stand-in for your actual username and osg.project_name as a stand-in for your project. In all cases, replace User.Name your actual user name and osg.project_name with your specific project name.
1 Setup data inputs and directories
- Define a relative path, we are starting from the root directory of this project.
- Define directory names for each run.
= paste0("rep_0", 0:9) dir_name
- Iterate across directories, create them, and then write a simple .csv file into them containing data to fit a linear model.
for(i in seq_along(dir_name)){
if(!file.exists(file.path(proj_dir, "example", "OSG", "array_lm", "inputs", dir_name[i], "data.csv")))
{set.seed(i)
dir.create(file.path(proj_dir, "examples", "OSG", "array_lm", "inputs", dir_name[i]), recursive=TRUE)
= data.frame(x=1:1000)
tmp $y = (i + (0.5*i)*tmp$x) + rnorm(1000,0,i)
tmpwrite.csv(tmp, file = file.path(proj_dir, "examples", "OSG", "array_lm", "inputs", dir_name[i], "data.csv"))
} }
- Write an R script to read in the data, run a linear model, and report back the estimated parameters.
if(!file.exists(file.path(proj_dir, "examples", "OSG", "array_lm", "inputs", "run_lm_osg_array.r"))){
= c("tmp=read.csv('data.csv')",
script_lines "fit=lm(y~x,data=tmp)",
"out = data.frame(par=unname(fit$coefficients))",
"write.csv(out,file='par.csv')"
)writeLines(script_lines, con = file.path(proj_dir, "examples", "OSG", "array_lm", "inputs", "run_lm_osg_array.r"))
}
- Write a text file containing the full path names for where the directories will be on OSG.
if(!file.exists(file.path(proj_dir, "examples", "OSG", "array_lm", "inputs", "osg_job_directories.txt"))){
= paste0("./../../inputs/", dir_name, "/")
dir_lines writeLines(dir_lines, con = file.path(proj_dir, "examples", "OSG", "array_lm", "inputs", "osg_job_directories.txt"))
}
- In addtion to the input files, you will need to have 2 additional scripts: a wrapper script (
wrapper.sh
) and a submission script (submission.sub
), examples of both can be found inexamples/OSG/array_lm/scripts
. To easily upload all the necessary files at once, compress the entirearray_lm/
directory as a tar.gz fileupload.array_lm.tar.gz
.
system(paste0("powershell cd ", file.path(proj_dir, "examples", "OSG", "array_lm"), ";tar -czf upload.array_lm.tar.gz * "))
2 OSG workflow
- Connect to OSG
As mentioned, access to OSG and file transfer is done using a pair of Terminal/PowerShell windows, we will call them Terminal A and Terminal B. In Terminal A, log onto your access point and create a directory for this example.
ssh User.Name@ap21.uc.osg-htc.org
mkdir array_lm
- Transfer files
We will upload the compressed fileupload.array_lm.tar.gz
into the OSG directory that you just created. The following files should be included:
- all replicate data files to run the linear models on;
- the r script run_lm_osg_array.r;
- the text file osg_job_directories.txt with the directory names;
- the wrapper script wrapper.sh which unpacks files, sets up job timing, executes the R script, and packages results;
- the submission script submission.sub;
- and a bash script prep.sh that prepares the files to be run on HTCondor, including changing file permissions, making directory structures, and changing dos2unix line endings.
For this example, we are using the container that was built in the Hera SS3 example. If you are unsure of how to build a container or access it in OSG please refer back to that example.
In the submission script submission.sub
you will need to change the following before you upload and run the script:
Line 7: Change
User.Name
to your user name.Line 15: Change
User.Name
to your user name andlinux-r4ss-v4.sif
to the name of your container file. For more information on building containers, see Running an array of SS3 jobs on Hera.Line 26: Change the project name from
osg.project_name
to the name of your OSG project.Lines 30 and 37: Change
User.Name
in the file path to your user name.NOTE There cannot be an empty last line in
submission.sub
. Make sure the script is 34 lines. If in doubt, backspace up to the last letter on the last line.
In Terminal B, navigate to the directory where the compressed file is on your local computer and run:
scp upload.array_lm.tar.gz User.Name@ap21.uc.osg-htc.org:/home/User.Name/array_lm
You will be prompted for your passphrase and RSA code before the file transfers. Once the file transfer is complete, go back to Terminal A and you can untar the files by navigating to the array_lm
directory and running:
tar - xvf upload.array_lm.tar.gz
- Prepare scripts
Still in Terminal A, change the permissions and line endings for osg_prep.sh
. Navigate to the scripts/bash
directory and change the change the line endings for the prep.sh
script and then execute it to prepare the other scripts as neccessary.
# navigate to directory
cd scripts/bash
# change permissions to make the file executable
chmod 777 prep.sh
# change line endings
dos2unix prep.sh
# run script
./prep.sh
- Submit job
Once you are ready, you can submit the job by running the command condor_submit
.
# navigate out of bash directory and into condor_submit directory
cd ../condor_submit
# submit job
condor_submit submission.sub
While your job is running, you can check on it using the following commands:
condor_q
shows status of all of your jobs: running, idle, or heldcondor_q -run
shows running jobs onlycondor_q -hold
shows jobs that are held
Using these commands you can get the job id number and peek directly at what is happening on the compute node using condor_ssh_to_the_job <job_id>
. This is most useful for look at longer jobs or to see if intermediate files are being produced correctly.
For more information on tracking and restarting jobs, see here. ##TODO: add link, check that the documentation below through log files is included in ss example
- Download results
Once the jobs have completed, you can retrieve the results for further analysis on your local computer. The easiest way to do this is to compress all of the directories in array_lm/inputs
. In Terminal A, logged into OSG, run:
tar -czf download.array_lm.tar.gz ./rep_*
We use the wildcard character *
to indicate that we want to include everything with the name starting with rep_. This will give us all of the directory folders. Then on your local compter create a new directory outputs
to put all of the downloaded results. In Terminal B, navigate to the outputs folder and run:
# navigate to outputs folder, assuming you are in array_lm/
cd outputs
# download all files from array_lm/inputs on OSG into current directory
scp -r User.Name@ap21.uc.osg-htc.org:/home/User.Name/array_lm/inputs/download.array_lm.tar.gz ./
Again you will be prompted for your passphrase and RSA code before any files can transfer.
You can then unzip the files by running:
# untar results
tar -xzf download.array_lm.tar.gz
in Terminal B.
3 Process the output
In R, iterate through the sub-directories of the input and output data to extract the results of the linear model fits, and the model run time information.
Show code
library(data.table)
library(magrittr)
= as.list(rep(NA,10))
input_data.list = as.list(rep(NA,10))
output_data.list = as.list(rep(NA,10))
runtime_data.list
for(i in seq_along(dir_name)){
# get input data
= fread(file.path(proj_dir, "examples", "OSG", "array_lm", "inputs", dir_name[i],"data.csv")) %>%
input_data.list[[i]] %>%
.[,.(x,y)] := factor(as.character(i),levels=as.character(1:10))] %>%
.[,model
.[,.(model,x,y)]
# untar results
system(paste0("powershell cd ", file.path(proj_dir, "examples", "OSG", "array_lm", "outputs", dir_name[i],"/"), ";tar -xzf End.tar.gz"))
# get output
= fread(file.path(proj_dir, "examples", "OSG", "array_lm", "outputs", dir_name[i],"par.csv")) %>%
output_data.list[[i]] %>%
.[,.(par)] := factor(as.character(i),levels=as.character(1:10))] %>%
.[,model %>%
.[,.(model,par)] melt(.,id.vars="model") %>%
:=c("intercept","slope")] %>%
.[,variabledcast(.,model ~ variable) %>%
merge(.,input_data.list[[i]][,.(model,x)],by="model") %>%
:= intercept+slope*x] %>%
.[,pred_y
.[,.(model,x,pred_y)]# get time
= readLines(file.path(proj_dir, "examples", "OSG", "array_lm", "outputs", dir_name[i],"runtime.txt")) %>%
runtime_data.list[[i]] gsub(".*?([0-9]+).*", "\\1", .) %>%
as.numeric(.) %>%
as.data.table(.) %>%
setnames(.,".","time") %>%
:= factor(as.character(i),levels=as.character(1:10))] %>%
.[,model melt(.,id.vars="model") %>%
:=c("start","end","runtime")] %>%
.[,variabledcast(.,model ~ variable) %>%
.[,.(model,start,end,runtime)]
}
= rbindlist(input_data.list)
input_data = rbindlist(output_data.list)
output_data = rbindlist(runtime_data.list) runtime_data
The jobs started execution at 2024-11-02 00:50:03 and all finished by 2024-11-02 00:50:31 for an elapsed runtime of 28 seconds and a total computation time of 6 seconds. Use of Hera resulted in a job completing 0.21\(\times\) faster. Figure 1 shows the simulated data and estimated linear fits for each model run in the job-array.
Show code
library(ggplot2)
%>%
input_data ggplot() +
geom_point(aes(x=x,y=y,fill=model),alpha=0.05,size=5,shape=21) +
geom_line(data=output_data,aes(x=x,y=pred_y,color=model),linewidth=2)