Educational aids or Curricula

Hovakim Grabski | 2024

Github


Abstract

The GitHub repository “CIP_Nextflow_on_HPC” is a comprehensive guide to leveraging Nextflow for deploying tasks on high-performance computing (HPC) systems, with a specific focus on SLURM-based environments. Nextflow is a powerful workflow management system designed to simplify the creation, orchestration, and execution of complex data analysis pipelines. This repository provides a step-by-step tutorial for setting up Nextflow, utilizing conda environments, and employing Singularity containers, making it accessible to users with a working knowledge of Linux and HPC environments.

The guide is specifically tailored for the San Diego Supercomputer Center’s Expanse, but its applicability extends to other SLURM-based HPC systems. By using Nextflow in conjunction with SLURM, you can take advantage of its ability to seamlessly manage entire workflows. While SLURM excels as a job scheduler, handling individual jobs and resource allocation, Nextflow provides a higher-level abstraction that simplifies the management of interconnected tasks within a pipeline.

This approach ensures that workflows are not only reproducible but also scalable and portable across different computational environments, making it an attractive option for researchers and scientists who need to manage complex data analysis pipelines.