This guide provides a comprehensive overview of managing, transferring research data at the University of Southern California's Center for Advanced Research Computing (CARC).
The CARC environment is designed for high-performance workloads, separating access, computation, and high-speed data movement.
- Head/Login Nodes: Primary entry points (
discovery.usc.edu,endeavour.usc.edu) used for code editing, compiling, and job submission. - Compute Nodes: Approximately 500 nodes running the Rocky Linux operating system where the actual heavy-duty processing occurs.
- Data Transfer Nodes (DTNs): Dedicated nodes (
hpc-transfer1,hpc-transfer2) with 100 Gbps connections optimized for large-scale data movement to reduce load on login nodes. - Scheduler: The SLURM resource manager handles all job scheduling and resource allocation.
Storage is partitioned based on performance needs, backup requirements, and collaborative access.
| Directory Type | Path Schema | Quota (Default) | Backup Policy | Best For... |
|---|---|---|---|---|
| Home | /home1/<user> |
100 GB / 255k files | Snapshots (2 weeks) | Scripts, configuration files, small tools. |
| Project | /project2/<pi_id> |
15 TB free per PI | Snapshots (2 weeks) | Shared group data and active research files. |
| Scratch | /scratch1/<user> |
10 TB (temporary) | NO BACKUP | Large temp files and high-speed job I/O. |
- Free Tier: 15 TB of project per PI is provided at no cost.
- Expansion: Additional storage can be purchased in 5 TB increments at $60/TB/year.
- Cold Storage: For long-term archiving of inactive data, use the
arcputcommand.
- Prohibited Data: CARC systems do not currently support sensitive or regulated data, including HIPAA (Protected Health Information), FERPA (Student Records), or PII (Personally Identifiable Information).
- Consultation: If your research requires the use of restricted data, you must contact
carc-support@usc.edufor a consultation before uploading any files.
Choose the method that best fits your file size and technical comfort level.
- URL: ondemand.carc.usc.edu
- Features: Provides a visual file explorer in your browser. Best for moving small individual files or managing directory structures visually.
The most robust tool for syncing data between your local machine and CARC. It can resume interrupted transfers and only copies files that have changed.
# SCRIPT: Syncing local data to CARC
# Note: A trailing '/' after the source folder copies only the contents.
rsync -rltvh ~/Documents/my_data/ <username>@discovery.usc.edu:/project2/<pi_id>/my_data
# Flags:
# -r: recursive (include subdirectories)
# -l: preserve symlinks
# -t: preserve modification times
# -v: verbose (show progress)
# -h: human-readable file sizes- Use standalone applications like CyberDuck or FileZilla.
- Connect using hostnames:
hpc-transfer1.usc.eduordiscovery.usc.edu.
- Globus: Ideal for fast transfers between different HPC centers or sharing with external collaborators.
- Rclone: Used to link cloud storage (Google Drive, OneDrive) to CARC; involves a more technical initial setup.
Managing access is critical for collaboration and quota management.
Permissions are calculated by summing numeric values:
- 4: Read (
r) - 2: Write (
w) - 1: Execute (
x)
- Change Permissions (
chmod):chmod 640 file.txt # User: rw, Group: r, Others: none chmod +x script.sh # Add execute permission chmod -R g-w directory # Recursively remove group write access
- Change Group Ownership (
chgrp): Quota is tracked via group ownership. Use this to ensure collaborators can access project files.chgrp ttroj_412 data.txt chgrp -R ttroj_412 /project/ttroj_412/group_data
This guide outlines how to ensure file integrity after a transfer using SHA-256 checksums. Verifying your files ensures that no data was corrupted, lost, or altered during the move.
The command used to generate hashes varies depending on your environment:
- Linux:
sha256sum - macOS:
shasum -a 256 - Windows (PowerShell):
Get-FileHash
Using Linux/Unix as the standard example, follow these steps to verify your transfer:
Navigate to the directory containing your original files and run:
find . -type f -exec sha256sum '{}' \; > sha256sum.txt
This will generate the file sha256sum.txt. Copy this file to the destination directory where files were transferred, and then from that directory enter:
sha256sum -c sha256sum.txt
This compares the file checksums from the source with the file checksums in the destination and prints the results. The transfer was successful if all of the checksums match, as indicated by an OK status. Note that the sha256sum.txt file itself will fail because it was not originally present in the source directory.