GreatLakes Cluster Guide for ROB572: Marine Robotics

This guide will walk you through setting up and using the GreatLakes HPC cluster for your ROB572 class project. Whether you're running simulations, training models for underwater perception, or processing sonar/sensor data, this guide covers everything you need to get started.

Getting Access
VS Code Remote Access
Setting Up a Conda Environment
Running GPU Jobs in Interactive Mode
Running GPU Jobs in Batch Mode
Monitoring Jobs
Tips and Tricks
Command Cheat Sheet

Getting Access

To use the GreatLakes cluster for your ROB572 class project, you need a user login. If you don't already have one, request a login here:

Great Lakes User Login: https://arc.umich.edu/login-request/

Once you have a login, you can submit jobs to the class account. Use the following line in all of your batch scripts:

#SBATCH --account=rob572w26_class

If you're new to HPC, check out the Great Lakes User Guide and consider attending an ARC Training Event. For any account issues, email arc-support@umich.edu.

Important Notes

Always use --account=rob572w26_class when submitting jobs.
Be mindful of resource usage — this is a shared class account, so avoid requesting more resources than you need and cancel jobs you're no longer using.
Do not leave idle interactive sessions running. Other students need access too.

VS Code Remote Access

VPN and SSH Setup

VPN: If you're off campus, connect to the U-M VPN first. Download the client from https://its.umich.edu/enterprise/wifi-networks/vpn/getting-started.
Install Remote - SSH: In VS Code, open the Extensions view (Ctrl+Shift+X), search for "Remote - SSH", and install it.
Connect: Open the Command Palette (Ctrl+Shift+P), type Remote-SSH: Connect to Host..., and enter:
```
ssh [uniqname]@greatlakes.arc-ts.umich.edu
```
Enter your password and complete Duo two-factor authentication. When prompted for the OS type, select Linux.
Open a workspace: Once connected, open a terminal in VS Code. We recommend creating a dedicated project directory:
```
mkdir -p ~/rob572_project
```
Then go to File > Open Folder and select rob572_project. This keeps your project files organized and ensures VS Code extensions (like Python IntelliSense) work correctly.

Python Extensions and Environment Setup

Install these VS Code extensions for a smooth development experience:

Python — Official Python extension with IntelliSense, linting, debugging, and formatting.
Jupyter — Create, edit, and run Jupyter notebooks directly in VS Code.

Search for them in the Extensions view and click Install.

GitHub Copilot Setup

GitHub Copilot is an AI coding assistant available as a VS Code extension. As a student, you can get it free through the GitHub Student Developer Pack.

After getting access, sign in with your GitHub account by clicking the user icon in the bottom-left of VS Code.

Setting Up a Conda Environment

Managing Storage on the Cluster

Your home directory has limited space (~80 GB). For class projects, this is usually sufficient, but if you need more space (large datasets, multiple environments), consider using scratch storage:

/scratch/rob572w26_class_root/rob572w26_class/[uniqname]

You can create a symlink from your home directory for convenience:

mkdir -p /scratch/rob572w26_class_root/rob572w26_class/[uniqname]/conda
ln -s /scratch/rob572w26_class_root/rob572w26_class/[uniqname]/conda ~/conda

Note: Scratch storage is temporary — files are deleted after 90 days. You'll receive an email before deletion. Always back up important work (e.g., push to GitHub).

Conda Environment Management

Common conda commands you'll use throughout your project:

Create a new environment:

conda create -n rob572_env python=3.10 -y

Activate the environment:
```
conda activate rob572_env
```
Deactivate the environment:
```
conda deactivate
```
List all environments:
```
conda env list
```
Export an environment (useful for sharing with teammates):
```
conda env export --name rob572_env > environment.yml
```
Recreate from an exported file:
```
conda env create -f environment.yml
```
Remove an environment:
```
conda env remove --name rob572_env
```

In VS Code, you can select your conda environment by opening a .py file and clicking the Python version in the bottom-right corner. Choose your rob572_env environment, and VS Code will automatically use it for terminals and code execution.

Installing Packages

Below is an example setup script for a marine robotics project that uses deep learning. Adjust package versions to match your project's requirements.

#!/bin/bash
CONDA_ENV_NAME=rob572_env
UNIQNAME=[YOUR_UNIQNAME]

# Download and install miniconda (skip if already installed)
mkdir -p ~/Downloads && cd ~/Downloads
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/conda/miniconda

# Initialize conda
source ~/conda/miniconda/bin/activate
conda init
source ~/.bashrc

# Clean up installer
rm -f ~/Downloads/Miniconda3-latest-Linux-x86_64.sh

# Create environment and install packages
conda create -n ${CONDA_ENV_NAME} python=3.10 -y
conda activate ${CONDA_ENV_NAME}

# GPU support (adjust CUDA version as needed)
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit -y
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia -y

# Common scientific/robotics packages
conda install matplotlib scikit-learn pandas scipy -y
pip install opencv-python tensorboard

Tip: Adjust the CUDA and PyTorch versions to match the requirements of the libraries you plan to use. If your project uses ROS, you may need a separate environment or Docker container — consult the instructor for guidance.

Running GPU Jobs in Interactive Mode

Interactive mode is ideal for debugging and short tests. For long-running experiments, use batch mode (next section).

Web-Based Access

The GreatLakes Portal offers interactive apps (Jupyter, VS Code, Basic Desktop) under the "Interactive Apps" tab. For GPU work, we recommend the Basic Desktop option, which provides a full desktop environment.

When submitting, use:

Account: rob572w26_class
Partition: gpu (check with the instructor for the correct partition)
Time: Keep it short (a few hours) for debugging

Resource guidelines: A single node typically has 8 GPUs, 32 CPU cores, and 372 GB memory. To be a good neighbor, limit CPUs to ~4 per GPU and memory to ~48 GB per GPU.

Connecting to Your Node via SSH

Once your interactive session is running, find the hostname in the session details (e.g., gl1709.arc-ts.umich.edu). From a VS Code terminal connected to Great Lakes:

ssh gl1709.arc-ts.umich.edu

Then cd to your project directory and activate your conda environment manually.

Command-Line Access

Request an interactive GPU session directly from the terminal:

salloc --job-name=debug --cpus-per-task=4 --nodes=1 --mem=16G --time=4:00:00 --account=rob572w26_class --partition=gpu --gres=gpu:1

Check your job status:

squeue -u [UNIQNAME]

Connect to the allocated node:

srun --jobid=[JOBID] --pty bash

Warning: If you close the salloc terminal, your job will be terminated. Use tmux or screen to keep your session alive:
tmux new -s rob572
You can detach with Ctrl+B then D, and reattach later with tmux attach -t rob572.

Jupyter Notebooks with GPU

To run a Jupyter notebook on a GPU node and connect from VS Code:

SSH into the allocated node, cd to your project directory, and activate your conda environment.

Start the notebook server:

jupyter notebook --no-browser --port=8888 --ip=0.0.0.0

Copy the URL with the token from the terminal output.
In VS Code, open your .ipynb file, click "Select Kernel" in the top-right, choose "Existing Jupyter Server", and paste the URL.

Running GPU Jobs in Batch Mode

Batch mode is the recommended way to run long experiments. It queues your job and runs it when resources are available — no need to keep a terminal open.

Creating a Batch Script

Here's a template batch script for a ROB572 project:

#!/bin/bash

#SBATCH --job-name=rob572_train
#SBATCH --mail-user=[UNIQNAME]@umich.edu
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --cpus-per-task=4
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=16G
#SBATCH --time=8:00:00
#SBATCH --account=rob572w26_class
#SBATCH --partition=gpu
#SBATCH --output=/home/[UNIQNAME]/rob572_project/logs/%x-%j.log
#SBATCH --gres=gpu:1

# Set up environment
source /home/[UNIQNAME]/.bashrc
conda activate rob572_env
cd ~/rob572_project

# Optional: copy dataset to local SSD for faster I/O
mkdir -p /tmpssd/[UNIQNAME]
cp ~/rob572_project/data /tmpssd/[UNIQNAME]/ -r

# Run your training script
python train.py \
    --data_dir /tmpssd/[UNIQNAME]/data \
    --output_dir ~/rob572_project/results \
    --epochs 50 \
    --batch_size 32 \
    --lr 0.001

Submit the job:

sbatch train.sh

Note: Copying data to /tmpssd (local SSD) avoids slow network transfers and can significantly speed up data loading. This is especially helpful for projects with large datasets (e.g., sonar imagery, point clouds).

Make sure the logs/ directory exists before submitting:

mkdir -p ~/rob572_project/logs

Parameterized Python Code

Structure your training scripts to accept command-line arguments, making it easy to run different experiments without editing code:

import argparse

def main(args):
    # Your training/simulation logic here
    print(f"Training with lr={args.lr}, epochs={args.epochs}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="ROB572 Project Training")
    parser.add_argument("--data_dir", type=str, required=True, help="Path to dataset")
    parser.add_argument("--output_dir", type=str, required=True, help="Path to save results")
    parser.add_argument("--batch_size", type=int, default=32, help="Batch size")
    parser.add_argument("--lr", type=float, default=0.001, help="Learning rate")
    parser.add_argument("--epochs", type=int, default=10, help="Number of epochs")
    args = parser.parse_args()
    main(args)

Parameterized Batch Scripts

To sweep over hyperparameters, use a loop that submits multiple jobs:

#!/bin/bash
ACCOUNT=rob572w26_class

LR_LIST=(0.01 0.001 0.0001)
BATCH_SIZES=(16 32 64)

for LR in "${LR_LIST[@]}"; do
for BS in "${BATCH_SIZES[@]}"; do
sbatch <<EOT
#!/bin/bash
#SBATCH --job-name=rob572_lr${LR}_bs${BS}
#SBATCH --mail-user=[UNIQNAME]@umich.edu
#SBATCH --mail-type=END,FAIL
#SBATCH --cpus-per-task=4
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=16G
#SBATCH --time=8:00:00
#SBATCH --account=${ACCOUNT}
#SBATCH --partition=gpu
#SBATCH --output=/home/[UNIQNAME]/rob572_project/logs/%x-%j.log
#SBATCH --gres=gpu:1

source /home/[UNIQNAME]/.bashrc
conda activate rob572_env
cd ~/rob572_project

python train.py \
    --data_dir ~/rob572_project/data \
    --output_dir ~/rob572_project/results/lr${LR}_bs${BS} \
    --lr ${LR} \
    --batch_size ${BS} \
    --epochs 50
EOT
done
done

Monitoring Jobs

Check your jobs:

squeue -u [UNIQNAME]

Check all jobs under the class account:

squeue -A rob572w26_class

Detailed job info:

scontrol show job [JOBID]

View resource usage for the class account:

squeue -A rob572w26_class -O "JobID,UserName,tres-per-job,tres-per-node,TimeUsed,TimeLeft"

Cancel a job:

scancel [JOBID]

Cancel all your jobs:

scancel -u [UNIQNAME]

Tips and Tricks

Project Organization

Keep a clear directory structure for your project (e.g., data/, src/, logs/, results/, notebooks/).
Use argparse to parameterize your scripts so you can easily run different experiments.
Log your results programmatically (to files or tools like TensorBoard/Weights & Biases) instead of copying from terminal output.

Being a Good Cluster Citizen

Don't hog resources. Only request what you need — if you need 1 GPU, don't request 4.
Cancel idle jobs. If you're done debugging, cancel your interactive session with scancel.
Use batch mode for long runs. Interactive sessions are for debugging, not overnight training.
Check the queue before submitting large jobs — if the class account is busy, wait or use fewer resources.

Version Control

Always push your code to GitHub or another Git host. The cluster is not a backup service.
Use .gitignore to exclude large data files, model checkpoints, and log files from your repo.

Storage Tips

Some packages cache data in your home directory. Redirect caches to scratch to save space:

# Add to your ~/.bashrc
export HF_DATASETS_CACHE="/scratch/rob572w26_class_root/rob572w26_class/[UNIQNAME]/cache/huggingface"
export HF_HOME="/scratch/rob572w26_class_root/rob572w26_class/[UNIQNAME]/cache/hf_home"

Use /tmpssd on compute nodes for fast local I/O during training jobs.

Debugging

Use interactive mode to test your code with a small dataset before submitting a batch job.
Check job logs in your logs/ directory if a batch job fails.
Use scontrol show job [JOBID] to see why a job is pending or failed.

Command Cheat Sheet

Command	Description
`ssh [uniqname]@greatlakes.arc-ts.umich.edu`	Connect to Great Lakes
`conda create -n rob572_env python=3.10 -y`	Create conda environment
`conda activate rob572_env`	Activate environment
`salloc --account=rob572w26_class --partition=gpu --gres=gpu:1 --mem=16G --time=4:00:00`	Request interactive GPU session
`sbatch train.sh`	Submit a batch job
`squeue -u [UNIQNAME]`	Check your jobs
`squeue -A rob572w26_class`	Check all class jobs
`scancel [JOBID]`	Cancel a job
`scontrol show job [JOBID]`	Detailed job info
`tmux new -s rob572`	Start a tmux session
`tmux attach -t rob572`	Reattach to tmux session

If you have questions about the cluster, email arc-support@umich.edu. For project-specific questions, reach out to the instructor or GSI.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GreatLakes Cluster Guide for ROB572: Marine Robotics

Getting Access

Important Notes

VS Code Remote Access

VPN and SSH Setup

Python Extensions and Environment Setup

GitHub Copilot Setup

Setting Up a Conda Environment

Managing Storage on the Cluster

Conda Environment Management

Installing Packages

Running GPU Jobs in Interactive Mode

Web-Based Access

Connecting to Your Node via SSH

Command-Line Access

Jupyter Notebooks with GPU

Running GPU Jobs in Batch Mode

Creating a Batch Script

Parameterized Python Code

Parameterized Batch Scripts

Monitoring Jobs

Tips and Tricks

Project Organization

Being a Good Cluster Citizen

Version Control

Storage Tips

Debugging

Command Cheat Sheet

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

GreatLakes Cluster Guide for ROB572: Marine Robotics

Getting Access

Important Notes

VS Code Remote Access

VPN and SSH Setup

Python Extensions and Environment Setup

GitHub Copilot Setup

Setting Up a Conda Environment

Managing Storage on the Cluster

Conda Environment Management

Installing Packages

Running GPU Jobs in Interactive Mode

Web-Based Access

Connecting to Your Node via SSH

Command-Line Access

Jupyter Notebooks with GPU

Running GPU Jobs in Batch Mode

Creating a Batch Script

Parameterized Python Code

Parameterized Batch Scripts

Monitoring Jobs

Tips and Tricks

Project Organization

Being a Good Cluster Citizen

Version Control

Storage Tips

Debugging

Command Cheat Sheet

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages