Jupyter Notebooks in Containers for HPCs

How to containerize project code, automate container builds with CI/CD and deploy code on HPCs.

Use Case:

This tutorial is for running complex Jupyter simulations that require specific input files (e.g., lattice files) and a highly customized Conda environment. By packaging the entire setup into a container, you ensure that the simulation works immediately for new users, eliminating the need for them to configure dependencies. This workflow saves time, avoids setup errors, and guarantees a consistent, working environment for all users.

This tutorial details a specific process used for SLAC National Lab to run Jupyter notebook based physics simulation in a reliable and reproducible way by having all python dependencies and simulation files pre setup in a sharable container. Anyone in the lab can now login to the HPC via Open onDemand and run the container which opens a configured Jupyter lab session with just a few clicks.

If you work at SLAC and want to adapt the container for Jupyter based simulation to run on s3df then see this page and this page.

If you are just getting started with containers then see this intro to Docker page.

All code referenced can be found here:

GitHub - sanjeev-one/SLAC-Containers: docker containers for physics simsGitHub

Apptainer

(Docker for HPCs)

Apptainer (formerly Singularity) is a container platform tailored for High-Performance Computing (HPC) environments, allowing users to run containers securely without needing root access. It enables the packaging of entire applications and dependencies into a single file, ensuring portability and reproducibility across different systems. Apptainer supports key HPC tools like MPI and GPU libraries, making it ideal for scientific workflows and large-scale computing tasks in shared, multi-user environments.

Apptainer docs: https://apptainer.org/ https://apptainer.org/docs/user/main/

Container Workflow Overview

The overall idea is to have a program code. eg python based simulation with a specific Conda environment backed by a Github repo. Any change to the Github repo will start a Github actions which will build the docker container and store it on docker hub. Another Github action will pull the new container to HPC each time it is changed. During this step, the docker container is converted to an apptainer container on the HPC. A script on Open OnDemand lets users start the locally stored container. Inside the container the setup code files get copied over to the users directory so the user can modifies the code and not have it overwritten each time the container restarts.

Github Repo Structure

The repo is broken down into:

notebooks directory: This directory has the Jupyter notebooks that contain the desired simulation and will be stored inside the container. Any other files that the simulation needs are stored here.
Dockerfile: This file tells docker how to build the container and what packages are needed. In our case the docker file setups up a custom conda environment and jupyter lab.
.github/workflows: This directory stores the yaml files that define github actions.

Notebooks Directory

Dockerfile

Dockerfiles automate the process of creating Docker containers by defining the environment, dependencies, and commands needed to build and run an application.

FROM ubuntu:latest
WORKDIR /opt


# Install necessary packages and OpenMPI
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        build-essential \
        gfortran \
        python3-dev \
        python3-pip \
        wget \
        openmpi-bin \
        libopenmpi-dev \
        libssl-dev \
        htop \
        rsync && \
    apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*



FROM docker.io/continuumio/miniconda3:latest

ENV PATH=/opt/conda/bin:$PATH

RUN chmod -R 777 /opt/conda

RUN /opt/conda/bin/conda install conda-forge::lume-impact=0.8.6  
RUN  sed -i "s|workdir = full_path(workdir)|workdir = tools.full_path(workdir) |g" /opt/conda/lib/python3.12/site-packages/lume/base.py

RUN /opt/conda/bin/conda install -c conda-forge impact-t
RUN /opt/conda/bin/conda install -c conda-forge impact-t=2.3.2=mpi_openmpi_hd97dee8_0
ENV CONDA_SOLVER=classic
RUN /opt/conda/bin/conda install -c conda-forge bmad=20240402.1


RUN /opt/conda/bin/conda install -y \
    jupyter=1.0.0 \
    jupyterlab=4.2.3 \
    scipy=1.13.1 \
    numpy=1.26.4 \
    matplotlib=3.8.4 \
    pillow=10.4.0 \
    pandas=2.2.2 \
    conda-forge::xopt=2.2.2 \
    conda-forge::distgen=2.0.2 \
    h5py=3.11.0 \
    pytao=0.3 \
    conda-forge::openpmd-beamphysics=0.9.4 && \
    /opt/conda/bin/conda clean -afy




# Install mpi4py with conda
RUN conda install -c conda-forge mpi4py=3.1.6

# Copy Jupyter notebooks into the image
COPY notebooks /opt/notebooks
#copy facet2 lattice over
COPY facet2-lattice /opt/notebooks/facet2-lattice

ENV FACET2_LATTICE=/opt/notebooks/facet2-lattice

RUN mkdir /sdf

# Expose port for JupyterLab
EXPOSE 8888
EXPOSE 8889

EXPOSE 5555
EXPOSE 5556


# Default command to run JupyterLab
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--no-browser", "--allow-root", "--notebook-dir=/opt/notebooks","--port=5555"]

Base Image and Working Directory: The container starts with a base image (e.g., Ubuntu), and a working directory is set for subsequent operations.
Install Necessary Packages: Essential development tools (e.g., compilers, Python, OpenMPI) are installed using the package manager. This ensures the environment is ready for building and running applications that require parallel computation.
Switch to Conda Environment: A Conda-based image is used to easily manage Python environments and dependencies. The Conda environment is configured by adding it to the system’s PATH for easy access.
Install Key Packages: Scientific libraries, Jupyter, MPI support, and domain-specific tools are installed using Conda or pip. This sets up the environment for scientific computing, machine learning, or other workloads.
Copy Project Files: Jupyter notebooks or other necessary project files are copied into the container for easy access during execution.
Expose Ports: Specific ports are exposed to allow access to services like JupyterLab or other applications running in the container.
Define Default Command: The container is set to automatically start a JupyterLab instance or other desired service when run.

Github Actions for CI/CD

GitHub Actions is a CI/CD platform that automates workflows, such as testing, building, and deploying code, directly within a GitHub repository.

GitHub Actions works by using YAML configuration files to define workflows, which consist of triggers (e.g., push, pull request), jobs, and steps. These jobs run in virtual environments (runners) provided by GitHub or self-hosted, and execute specified tasks such as testing, building, or deploying applications. Workflows are automatically triggered based on events in the repository, streamlining development processes.

This particular example uses two github actions to build a docker image, store a copy on dockerhub, and deploy a copy to the hpc.

Here's an explanation of the GitHub Actions YAML file for building and pushing a Docker image:

Build Container Github Action:

1. Workflow Name and Trigger Conditions:

name: S3DF - Build and Push Docker Image
on:
  push:
    branches:
      - main
    paths-ignore:
      - '.github/workflows/**'
      - 'README.md'
      - 'ondemand.sh'
      - 'Dockerfile.NERSC'
      - './NERSC_notebooks/**'
      - 'impact_bmad_NERSC.sh'
  pull_request:
    branches:
      - main
    paths-ignore:
      - '.github/workflows/**'

Name: This workflow is called "S3DF - Build and Push Docker Image."
Triggers: The workflow is triggered by two events:
- When there’s a push to the main branch, unless it only involves changes to certain files (like documentation or NERSC-related files).
- When a pull request targets the main branch, ignoring the same set of files.

This ensures the workflow only runs when relevant code or configurations are updated.

2. Job Definition:

jobs:
  build:
    runs-on: ubuntu-latest

A job called build is defined, which runs on the latest version of Ubuntu in GitHub’s virtual environment. This sets the platform on which the subsequent steps will execute.

3. Steps in the Job:

a. Checkout Repository:

steps:
  - name: Checkout repository
    uses: actions/checkout@v2
    with:
      submodules: true

This step checks out the repository’s code so it can be used within the job. The submodules: true option ensures any Git submodules are also pulled.

b. Install Git LFS:

- name: Install Git LFS
  run: |
    sudo apt-get update
    sudo apt-get install -y git-lfs
    git lfs install

This step installs Git LFS (Large File Storage), which is required if the repository contains large files tracked by Git LFS. It updates the system and installs the necessary packages. Only needed when files needed by the simulation are stored in LFS.

c. Fetch Git LFS Files:

- name: Fetch Git LFS files
  run: git lfs pull

This step fetches the large files tracked by Git LFS to ensure the correct files are available for the build process. Only needed when files needed by the simulation are stored in LFS.

d. Set up Docker Buildx:

- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@v1

Docker Buildx is set up to enable advanced Docker build features, such as building multi-platform images. This is necessary if the image needs to support different architectures.

e. Log in to Docker Hub:

- name: Log in to Docker Hub
  uses: docker/login-action@v2
  with:
    username: ${{ secrets.DOCKER_HUB_USERNAME }}
    password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}

This step logs into Docker Hub using the credentials stored in GitHub Secrets. The username and password are securely pulled from DOCKER_HUB_USERNAME and DOCKER_HUB_ACCESS_TOKEN environment variables. This is required for pushing images to Docker Hub.

f. Build and Push Docker Image:

- name: Build and push Docker image
  uses: docker/build-push-action@v2
  with:
    context: .
    file: ./Dockerfile.s3df
    push: true
    tags: ${{ secrets.DOCKER_HUB_USERNAME }}/impact-bmad:latest

This step builds the Docker image using the Dockerfile.s3df in the current context (.), and then pushes the built image to Docker Hub. The tags field sets the tag for the image (in this case, latest), using the username from the secret. The image will be named ${DOCKER_HUB_USERNAME}/impact-bmad:latest.

Summary:

This GitHub Actions workflow automates the process of building and pushing a Docker image to Docker Hub. It is triggered by updates to the main branch, installs Git LFS for large file support, sets up Docker Buildx, logs into Docker Hub, and finally builds and pushes the image using the specified Dockerfile. This allows for continuous integration and delivery of the Docker image directly from code changes.

Pull to HPC Github Action:

1. Workflow Name and Trigger Conditions:

name: Update Singularity Container
on:
    workflow_run:
      workflows: ["S3DF - Build and Push Docker Image"]
      types:
        - completed
    workflow_dispatch:

Name: This workflow is called "Update Singularity Container."
Triggers:
- It is automatically triggered when the S3DF - Build and Push Docker Image workflow completes, meaning the Docker image has been successfully built and pushed.
- It can also be triggered manually using the workflow_dispatch event (manually with a button).

2. Job Definition:

jobs:
  update-container:
    runs-on: ubuntu-latest

The job is named update-container and will run on the ubuntu-latest virtual machine provided by GitHub.

3. Steps in the Job:

a. Checkout Repository:

    - name: Checkout repository
      uses: actions/checkout@v2

This step checks out the repository so the workflow has access to the necessary files or scripts in the repository, though in this case, it's more about preparing the environment.

b. Install SSHPass:

    - name: Install sshpass
      run: sudo apt-get update && sudo apt-get install -y sshpass

sshpass is installed here to facilitate automated SSH login using a password. This tool is needed to connect to the remote system (in this case, an HPC environment at SLAC) without manual password input.

c. Remove Old Apptainer Container and Pull New One:

    - name: Remove old Apptainer container and pull new one
      env:
        SSH_PASSWORD: ${{ secrets.SSH_PASSWORD }}
      run: |
        sshpass -p "$SSH_PASSWORD" ssh -o StrictHostKeyChecking=no [email protected] << 'EOF'
        sleep 3
        sshpass -p "$SSH_PASSWORD" ssh iana
        sleep 15
        export APPTAINER_CACHEDIR=$SCRATCH/.apptainer
        rm -rf $SCRATCH/.apptainer
        rm -rf $SCRATCH/tmp*
        mkdir $SCRATCH/.apptainer
        cd /sdf/group/facet/sanjeev/containers
        rm -f impact-bmad_latest.sif
        singularity pull docker://slacact/impact-bmad
        ls /sdf/group/facet/sanjeev/containers
        EOF

This step performs the core function of the workflow by connecting to the remote server (s3dflogin.slac.stanford.edu) using sshpass and the provided SSH password (stored securely in GitHub Secrets).
The script does the following:
1. Logs into the server.
2. Connects to another server (iana), which may be part of s3df's interactive compute pool.
3. Cleans up old container files, including cache and temporary files.
4. Pulls the updated Apptainer container (impact-bmad_latest.sif) from Docker Hub using the singularity pull command, converting the Docker image to a Singularity (also called apptainer) Image Format (SIF) file.
5. Lists the contents of the container directory to confirm the update.

Summary:

This GitHub Actions workflow is designed to update a Singularity container after a Docker image build is completed. It automates the SSH connection to a remote HPC environment, removes outdated container files, and pulls the latest container from Docker Hub using the Singularity tool. The workflow can be triggered either by a completed Docker build or manually.

Open OnDemand script

Open OnDemand is a web-based platform that provides users with easy access to High-Performance Computing (HPC) resources, allowing them to manage files, submit jobs, and run applications through a browser interface.

In SLAC's S3df OnDemand setup, one can define apptainer containers to use a jupyterlab kernels with a custom script.

For the SLAC container, this script works with the previously mentioned dockerfile:


# Set the environment variable for the Apptainer image path
export APPTAINER_IMAGE_PATH=/sdf/group/facet/sanjeev/containers/impact-bmad_latest.sif
export NOTEBOOK_ROOT=$HOME/impact_bmad_container_notebooks
mkdir -p $HOME/impact_bmad_container_notebooks

# Define the jupyter function to use Apptainer for executing Jupyter with necessary bindings and running mkdir and cp commands
function jupyter() {
apptainer exec -B /usr,/sdf,/fs,/sdf/scratch,/lscratch ${APPTAINER_IMAGE_PATH} bash -c "
        mkdir -p ${NOTEBOOK_ROOT} &&
        cp -rn /opt/notebooks/* ${NOTEBOOK_ROOT}/";
  	apptainer exec -B /usr,/sdf,/fs,/sdf/scratch,/lscratch ${APPTAINER_IMAGE_PATH}  jupyter $@;
}

This script sets up and runs Jupyter using Apptainer (formerly Singularity) to execute a Jupyter environment from within a container. Here's a breakdown of how it works:

1. Set Environment Variables:

export APPTAINER_IMAGE_PATH=/sdf/group/facet/sanjeev/containers/impact-bmad_latest.sif
export NOTEBOOK_ROOT=$HOME/impact_bmad_container_notebooks
mkdir -p $HOME/impact_bmad_container_notebooks

APPTAINER_IMAGE_PATH: This specifies the path to the Apptainer image (impact-bmad_latest.sif), which contains the pre-built environment.
NOTEBOOK_ROOT: Sets the path where the user's Jupyter notebooks will be stored locally (impact_bmad_container_notebooks).
The mkdir -p command ensures that the notebook directory exists, creating it if it doesn't.

2. Define the `jupyter` Function:

function jupyter() {
    apptainer exec -B /usr,/sdf,/fs,/sdf/scratch,/lscratch ${APPTAINER_IMAGE_PATH} bash -c "
        mkdir -p ${NOTEBOOK_ROOT} &&
        cp -rn /opt/notebooks/* ${NOTEBOOK_ROOT}/";
    apptainer exec -B /usr,/sdf,/fs,/sdf/scratch,/lscratch ${APPTAINER_IMAGE_PATH}  jupyter $@;
}

This function wraps the Apptainer exec command to run Jupyter within the container.
Bind Mounts: The -B /usr,/sdf,/fs,/sdf/scratch,/lscratch option mounts critical directories from the host into the container, ensuring that the container has access to these paths.
The first apptainer exec creates the notebook directory and copies default notebooks from /opt/notebooks/ in the container to the user's directory if they don't already exist (cp -rn ensures no overwriting).
The second apptainer exec command runs Jupyter inside the container, passing along any arguments ($@) provided to the function (e.g., Jupyter options like notebook or lab).

Purpose:

This script ensures that Jupyter runs inside the containerized environment, while also synchronizing notebooks between the container and the user’s local file system, providing a seamless workflow for running Jupyter from within an HPC environment.

Copying the notebook files out of the container is important because the container's file system is read-only, meaning users cannot directly modify the notebooks or other files inside it. By copying the notebooks from the container into a local, writable directory, users gain the ability to edit and run their experiments without restrictions.

If users want to modify or reset their notebooks to the original state provided by the container, they can simply delete the local notebook files. When they restart the container, the notebook files will be copied again from the read-only container, giving them a fresh, unmodified set of files to work with. This approach ensures both flexibility for customization and easy recovery of the original files.

Conclusion

This tutorial demonstrates how to create, manage, and deploy containerized Jupyter simulations for High-Performance Computing (HPC) environments, specifically using SLAC's S3DF infrastructure. By utilizing Apptainer (formerly Singularity) containers, users can package complex simulations with all necessary dependencies, input files, and configurations, ensuring reproducibility and ease of use for new users. The automated workflows, powered by GitHub Actions, handle building and updating the containers, while Open OnDemand provides an accessible interface for running Jupyter notebooks directly from the HPC environment. This approach eliminates setup errors, saves time, and ensures consistent simulation environments, enabling researchers to focus on their work instead of system configuration.

NextIntro to Docker

Last updated 8 months ago

Use Case:

Apptainer

Container Workflow Overview

Github Repo Structure

Notebooks Directory

Dockerfile

Github Actions for CI/CD

Build Container Github Action:

1. Workflow Name and Trigger Conditions:

2. Job Definition:

3. Steps in the Job:

Summary:

Pull to HPC Github Action:

1. Workflow Name and Trigger Conditions:

2. Job Definition:

3. Steps in the Job:

Summary:

Open OnDemand script

1. Set Environment Variables:

2. Define the jupyter Function:

Purpose:

Conclusion

2. Define the `jupyter` Function: