Jupyter Notebooks in Containers for HPCs
How to containerize project code, automate container builds with CI/CD and deploy code on HPCs.
Use Case:
This tutorial is for running complex Jupyter simulations that require specific input files (e.g., lattice files) and a highly customized Conda environment. By packaging the entire setup into a container, you ensure that the simulation works immediately for new users, eliminating the need for them to configure dependencies. This workflow saves time, avoids setup errors, and guarantees a consistent, working environment for all users.
This tutorial details a specific process used for SLAC National Lab to run Jupyter notebook based physics simulation in a reliable and reproducible way by having all python dependencies and simulation files pre setup in a sharable container. Anyone in the lab can now login to the HPC via Open onDemand and run the container which opens a configured Jupyter lab session with just a few clicks.
If you work at SLAC and want to adapt the container for Jupyter based simulation to run on s3df then see this page and this page.
If you are just getting started with containers then see this intro to Docker page.
All code referenced can be found here:
Apptainer
(Docker for HPCs)
Apptainer (formerly Singularity) is a container platform tailored for High-Performance Computing (HPC) environments, allowing users to run containers securely without needing root access. It enables the packaging of entire applications and dependencies into a single file, ensuring portability and reproducibility across different systems. Apptainer supports key HPC tools like MPI and GPU libraries, making it ideal for scientific workflows and large-scale computing tasks in shared, multi-user environments.
Apptainer docs: https://apptainer.org/ https://apptainer.org/docs/user/main/
Container Workflow Overview
The overall idea is to have a program code. eg python based simulation with a specific Conda environment backed by a Github repo. Any change to the Github repo will start a Github actions which will build the docker container and store it on docker hub. Another Github action will pull the new container to HPC each time it is changed. During this step, the docker container is converted to an apptainer container on the HPC. A script on Open OnDemand lets users start the locally stored container. Inside the container the setup code files get copied over to the users directory so the user can modifies the code and not have it overwritten each time the container restarts.
Github Repo Structure
The repo is broken down into:
notebooks directory: This directory has the Jupyter notebooks that contain the desired simulation and will be stored inside the container. Any other files that the simulation needs are stored here.
Dockerfile: This file tells docker how to build the container and what packages are needed. In our case the docker file setups up a custom conda environment and jupyter lab.
.github/workflows: This directory stores the yaml files that define github actions.
Notebooks Directory
Dockerfile
Dockerfiles automate the process of creating Docker containers by defining the environment, dependencies, and commands needed to build and run an application.
Base Image and Working Directory: The container starts with a base image (e.g., Ubuntu), and a working directory is set for subsequent operations.
Install Necessary Packages: Essential development tools (e.g., compilers, Python, OpenMPI) are installed using the package manager. This ensures the environment is ready for building and running applications that require parallel computation.
Switch to Conda Environment: A Conda-based image is used to easily manage Python environments and dependencies. The Conda environment is configured by adding it to the system’s PATH for easy access.
Install Key Packages: Scientific libraries, Jupyter, MPI support, and domain-specific tools are installed using Conda or pip. This sets up the environment for scientific computing, machine learning, or other workloads.
Copy Project Files: Jupyter notebooks or other necessary project files are copied into the container for easy access during execution.
Expose Ports: Specific ports are exposed to allow access to services like JupyterLab or other applications running in the container.
Define Default Command: The container is set to automatically start a JupyterLab instance or other desired service when run.
Github Actions for CI/CD
GitHub Actions is a CI/CD platform that automates workflows, such as testing, building, and deploying code, directly within a GitHub repository.
GitHub Actions works by using YAML configuration files to define workflows, which consist of triggers (e.g., push, pull request), jobs, and steps. These jobs run in virtual environments (runners) provided by GitHub or self-hosted, and execute specified tasks such as testing, building, or deploying applications. Workflows are automatically triggered based on events in the repository, streamlining development processes.
This particular example uses two github actions to build a docker image, store a copy on dockerhub, and deploy a copy to the hpc.
Here's an explanation of the GitHub Actions YAML file for building and pushing a Docker image:
Build Container Github Action:
1. Workflow Name and Trigger Conditions:
Name: This workflow is called "S3DF - Build and Push Docker Image."
Triggers: The workflow is triggered by two events:
When there’s a push to the
main
branch, unless it only involves changes to certain files (like documentation or NERSC-related files).When a pull request targets the
main
branch, ignoring the same set of files.
This ensures the workflow only runs when relevant code or configurations are updated.
2. Job Definition:
A job called build is defined, which runs on the latest version of Ubuntu in GitHub’s virtual environment. This sets the platform on which the subsequent steps will execute.
3. Steps in the Job:
a. Checkout Repository:
This step checks out the repository’s code so it can be used within the job. The submodules: true
option ensures any Git submodules are also pulled.
b. Install Git LFS:
This step installs Git LFS (Large File Storage), which is required if the repository contains large files tracked by Git LFS. It updates the system and installs the necessary packages. Only needed when files needed by the simulation are stored in LFS.
c. Fetch Git LFS Files:
This step fetches the large files tracked by Git LFS to ensure the correct files are available for the build process. Only needed when files needed by the simulation are stored in LFS.
d. Set up Docker Buildx:
Docker Buildx is set up to enable advanced Docker build features, such as building multi-platform images. This is necessary if the image needs to support different architectures.
e. Log in to Docker Hub:
This step logs into Docker Hub using the credentials stored in GitHub Secrets. The username
and password
are securely pulled from DOCKER_HUB_USERNAME
and DOCKER_HUB_ACCESS_TOKEN
environment variables. This is required for pushing images to Docker Hub.
f. Build and Push Docker Image:
This step builds the Docker image using the Dockerfile.s3df in the current context (.
), and then pushes the built image to Docker Hub. The tags
field sets the tag for the image (in this case, latest
), using the username from the secret. The image will be named ${DOCKER_HUB_USERNAME}/impact-bmad:latest
.
Summary:
This GitHub Actions workflow automates the process of building and pushing a Docker image to Docker Hub. It is triggered by updates to the main
branch, installs Git LFS for large file support, sets up Docker Buildx, logs into Docker Hub, and finally builds and pushes the image using the specified Dockerfile. This allows for continuous integration and delivery of the Docker image directly from code changes.
Pull to HPC Github Action:
1. Workflow Name and Trigger Conditions:
Name: This workflow is called "Update Singularity Container."
Triggers:
It is automatically triggered when the
S3DF - Build and Push Docker Image
workflow completes, meaning the Docker image has been successfully built and pushed.It can also be triggered manually using the
workflow_dispatch
event (manually with a button).
2. Job Definition:
The job is named update-container and will run on the ubuntu-latest virtual machine provided by GitHub.
3. Steps in the Job:
a. Checkout Repository:
This step checks out the repository so the workflow has access to the necessary files or scripts in the repository, though in this case, it's more about preparing the environment.
b. Install SSHPass:
sshpass is installed here to facilitate automated SSH login using a password. This tool is needed to connect to the remote system (in this case, an HPC environment at SLAC) without manual password input.
c. Remove Old Apptainer Container and Pull New One:
This step performs the core function of the workflow by connecting to the remote server (
s3dflogin.slac.stanford.edu
) using sshpass and the provided SSH password (stored securely in GitHub Secrets).The script does the following:
Logs into the server.
Connects to another server (
iana
), which may be part of s3df's interactive compute pool.Cleans up old container files, including cache and temporary files.
Pulls the updated Apptainer container (
impact-bmad_latest.sif
) from Docker Hub using thesingularity pull
command, converting the Docker image to a Singularity (also called apptainer) Image Format (SIF) file.Lists the contents of the container directory to confirm the update.
Summary:
This GitHub Actions workflow is designed to update a Singularity container after a Docker image build is completed. It automates the SSH connection to a remote HPC environment, removes outdated container files, and pulls the latest container from Docker Hub using the Singularity tool. The workflow can be triggered either by a completed Docker build or manually.
Open OnDemand script
Open OnDemand is a web-based platform that provides users with easy access to High-Performance Computing (HPC) resources, allowing them to manage files, submit jobs, and run applications through a browser interface.
In SLAC's S3df OnDemand setup, one can define apptainer containers to use a jupyterlab kernels with a custom script.
For the SLAC container, this script works with the previously mentioned dockerfile:
This script sets up and runs Jupyter using Apptainer (formerly Singularity) to execute a Jupyter environment from within a container. Here's a breakdown of how it works:
1. Set Environment Variables:
APPTAINER_IMAGE_PATH
: This specifies the path to the Apptainer image (impact-bmad_latest.sif
), which contains the pre-built environment.NOTEBOOK_ROOT
: Sets the path where the user's Jupyter notebooks will be stored locally (impact_bmad_container_notebooks
).The
mkdir -p
command ensures that the notebook directory exists, creating it if it doesn't.
2. Define the jupyter
Function:
jupyter
Function:This function wraps the Apptainer
exec
command to run Jupyter within the container.Bind Mounts: The
-B /usr,/sdf,/fs,/sdf/scratch,/lscratch
option mounts critical directories from the host into the container, ensuring that the container has access to these paths.The first
apptainer exec
creates the notebook directory and copies default notebooks from/opt/notebooks/
in the container to the user's directory if they don't already exist (cp -rn
ensures no overwriting).The second
apptainer exec
command runs Jupyter inside the container, passing along any arguments ($@
) provided to the function (e.g., Jupyter options likenotebook
orlab
).
Purpose:
This script ensures that Jupyter runs inside the containerized environment, while also synchronizing notebooks between the container and the user’s local file system, providing a seamless workflow for running Jupyter from within an HPC environment.
Copying the notebook files out of the container is important because the container's file system is read-only, meaning users cannot directly modify the notebooks or other files inside it. By copying the notebooks from the container into a local, writable directory, users gain the ability to edit and run their experiments without restrictions.
If users want to modify or reset their notebooks to the original state provided by the container, they can simply delete the local notebook files. When they restart the container, the notebook files will be copied again from the read-only container, giving them a fresh, unmodified set of files to work with. This approach ensures both flexibility for customization and easy recovery of the original files.
Conclusion
This tutorial demonstrates how to create, manage, and deploy containerized Jupyter simulations for High-Performance Computing (HPC) environments, specifically using SLAC's S3DF infrastructure. By utilizing Apptainer (formerly Singularity) containers, users can package complex simulations with all necessary dependencies, input files, and configurations, ensuring reproducibility and ease of use for new users. The automated workflows, powered by GitHub Actions, handle building and updating the containers, while Open OnDemand provides an accessible interface for running Jupyter notebooks directly from the HPC environment. This approach eliminates setup errors, saves time, and ensures consistent simulation environments, enabling researchers to focus on their work instead of system configuration.
Last updated