Add JSC eval setup #11

digitsisyph · 2026-01-15T10:04:24Z

Add JSC eval setup

gemini-code-assist · 2026-01-15T10:04:46Z

Summary of Changes

Hello @digitsisyph, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the foundational setup for conducting SWE-bench evaluations using the Harbor framework on JSC supercomputers. It provides all necessary documentation, configuration files, and an automated SLURM script to streamline the process from environment preparation to running benchmarks and optionally uploading results, ensuring a robust and reproducible evaluation pipeline.

Highlights

JSC Evaluation Setup Documentation: A comprehensive README.md has been added, detailing the complete process for setting up the environment and running SWE-bench evaluations on JSC supercomputers. This includes prerequisites, one-time setup steps, execution commands, monitoring, troubleshooting, and file structure.
Harbor Evaluation Configuration: A new jsc_eval_config.yaml file defines the configuration for Harbor evaluations, specifying parameters such as job directory, number of attempts, timeout settings, orchestrator type (local with concurrent trials and retry logic), and environment type (Daytona).
Automated SLURM Evaluation Script: A jsc_eval_harbor.sbatch SLURM script has been introduced to automate the entire evaluation workflow. This script handles environment activation, cache configuration for various tools (vLLM, Triton, UV, Hugging Face), secret loading, vLLM server management, execution of Harbor evaluation commands, metadata saving, and optional result uploading to HuggingFace.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

digitsisyph · 2026-01-15T10:05:19Z

Hi @neginraoof , this is the setup and code for jsc eval.

gemini-code-assist

Code Review

This pull request introduces a setup for running evaluations on JSC supercomputers. The changes include a detailed README guide, a SLURM batch script, and a Harbor configuration file. My review focuses on ensuring the setup instructions are correct and the scripts are robust. I've found a critical inconsistency between the Python version specified in the setup guide (3.11) and the one required by the evaluation script (3.13), which will cause issues for users. I've provided suggestions to align the documentation with the script's requirements. Additionally, I've identified a critical bug in the SLURM script's logic for counting errors, which would prevent the auto-upload skipping feature from working correctly. I've also included suggestions to improve the script's robustness and clarity by refining how it locates dependencies and manages environment modules. Overall, the contribution is valuable, and addressing these points will significantly improve its reliability and user experience.

gemini-code-assist · 2026-01-15T10:07:49Z

eval/JSC/README.md

+### 4. Create Python 3.11 Environment
+```bash
+eval "$(${MAMBA_INSTALL}/bin/conda shell.bash hook)"
+export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py311"
+${MAMBA_INSTALL}/bin/mamba create --prefix ${EVAL_ENV} python=3.11 -c conda-forge --override-channels -y
+source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV}


The setup instructions guide the user to create a Python 3.11 environment, but the jsc_eval_harbor.sbatch script is configured for and recommends Python 3.13. This inconsistency will cause errors and confusion. To fix this, the instructions should be updated to create a Python 3.13 environment.

Suggested change

### 4. Create Python 3.11 Environment

```bash

eval "$(${MAMBA_INSTALL}/bin/conda shell.bash hook)"

export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py311"

${MAMBA_INSTALL}/bin/mamba create --prefix ${EVAL_ENV} python=3.11 -c conda-forge --override-channels -y

source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV}

### 4. Create Python 3.13 Environment

```bash

eval "$(${MAMBA_INSTALL}/bin/conda shell.bash hook)"

export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py313"

${MAMBA_INSTALL}/bin/mamba create --prefix ${EVAL_ENV} python=3.13 -c conda-forge --override-channels -y

source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV}

gemini-code-assist · 2026-01-15T10:07:49Z

eval/JSC/README.md

+export MAMBA_INSTALL="/p/project1/ccstdl/${USER}/mamba"
+export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py311"
+alias activate_eval='source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV}'


To maintain consistency with the required Python 3.13 environment, the EVAL_ENV variable configured in .bashrc should point to the eval_py313 directory.

Suggested change

export MAMBA_INSTALL="/p/project1/ccstdl/${USER}/mamba"

export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py311"

alias activate_eval='source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV}'

export MAMBA_INSTALL="/p/project1/ccstdl/${USER}/mamba"

export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py313"

alias activate_eval='source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV}'

gemini-code-assist · 2026-01-15T10:07:50Z

eval/JSC/jsc_eval_harbor.sbatch

+            DAYTONA_COUNT=$(python3 -c "
+import json
+import sys
+
+try:
+    with open('$RESULT_FILE', 'r') as f:
+        data = json.load(f)
+
+    total_errors = 0
+    error_ids = []
+
+    if 'stats' in data and 'evals' in data['stats']:
+        for eval_key, eval_data in data['stats']['evals'].items():
+            if 'exception_stats' in eval_data and 'DaytonaError' in eval_data['exception_stats']:
+                daytona_errors = eval_data['exception_stats']['DaytonaError']
+                if isinstance(daytona_errors, list):
+                    total_errors += len(daytona_errors)
+                    error_ids.extend(daytona_errors)
+
+    print(total_errors)
+
+    # Also write detailed info to stderr for logging
+    if total_errors > 0:
+        print(f'Found {total_errors} DaytonaError(s): {error_ids[:5]}...', file=sys.stderr)
+
+except Exception as e:
+    print(f'Error parsing result.json: {e}', file=sys.stderr)
+    print('0')
+" 2>&1 | tail -n 1)


The command used to capture DAYTONA_COUNT is buggy. By redirecting stderr to stdout (2>&1) and taking the last line (tail -n 1), you are capturing a log message from stderr instead of the actual count printed to stdout. This will cause the subsequent integer comparison to fail.

The Python script correctly prints only the count to stdout. You should capture stdout directly by removing the pipe.

DAYTONA_COUNT=$(python3 -c " import json import sys try: with open('$RESULT_FILE', 'r') as f: data = json.load(f) total_errors = 0 error_ids = [] if 'stats' in data and 'evals' in data['stats']: for eval_key, eval_data in data['stats']['evals'].items(): if 'exception_stats' in eval_data and 'DaytonaError' in eval_data['exception_stats']: daytona_errors = eval_data['exception_stats']['DaytonaError'] if isinstance(daytona_errors, list): total_errors += len(daytona_errors) error_ids.extend(daytona_errors) print(total_errors) # Also write detailed info to stderr for logging if total_errors > 0: print(f'Found {total_errors} DaytonaError(s): {error_ids[:5]}...', file=sys.stderr) except Exception as e: print(f'Error parsing result.json: {e}', file=sys.stderr) print('0') ")

gemini-code-assist · 2026-01-15T10:07:50Z

eval/JSC/README.md

+```
+/p/project1/ccstdl/<username>/
+├── mamba/                     # Miniforge installation
+├── eval_py311/                # Python 3.11 environment


To reflect the correct Python 3.13 environment, this line in the file structure diagram should be updated.

Suggested change

├── eval_py311/ # Python 3.11 environment

├── eval_py313/ # Python 3.13 environment

gemini-code-assist · 2026-01-15T10:07:50Z

eval/JSC/jsc_eval_harbor.sbatch

+# Option 1: Use latest 2024 stack
+ml Stages/2024
+ml StdEnv/2024
+ml Python/3.11.3


The script loads the Python/3.11.3 module but then activates a Conda environment with Python 3.13. This is confusing and potentially problematic if there are conflicting libraries. Since the Conda environment provides the Python interpreter and its dependencies, loading the system Python module is likely unnecessary and could be removed to avoid confusion and potential conflicts.

gemini-code-assist · 2026-01-15T10:07:50Z

eval/JSC/jsc_eval_harbor.sbatch

+    SNAPSHOT_SCRIPT="${SLURM_SUBMIT_DIR:-$(pwd)}/snapshot_download.py"
+    if [ ! -f "$SNAPSHOT_SCRIPT" ]; then
+        # Try parent directory (tacc/)
+        SNAPSHOT_SCRIPT="${SLURM_SUBMIT_DIR:-$(pwd)}/../tacc/snapshot_download.py"
+    fi


Locating snapshot_download.py relative to SLURM_SUBMIT_DIR is fragile because it depends on the directory where sbatch is executed. A more robust approach is to define the path relative to this script's own location. This makes the dependency on the tacc directory's script explicit and reliable, regardless of the submission directory.

Suggested change

SNAPSHOT_SCRIPT="${SLURM_SUBMIT_DIR:-$(pwd)}/snapshot_download.py"

if [ ! -f "$SNAPSHOT_SCRIPT" ]; then

# Try parent directory (tacc/)

SNAPSHOT_SCRIPT="${SLURM_SUBMIT_DIR:-$(pwd)}/../tacc/snapshot_download.py"

fi

# Use the snapshot_download.py script from the sibling tacc directory

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

SNAPSHOT_SCRIPT="${SCRIPT_DIR}/../tacc/snapshot_download.py"

penfever · 2026-01-21T21:18:42Z

@digitsisyph Please resolve the outstanding comments and request a review to keep this moving!

Add jsc eval setup

bf3e90e

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

Merge branch 'main' into digitsi/jsc_eval

f71a467

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JSC eval setup #11

Add JSC eval setup #11

Uh oh!

digitsisyph commented Jan 15, 2026

Uh oh!

gemini-code-assist bot commented Jan 15, 2026

Uh oh!

digitsisyph commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

penfever commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	├── eval_py311/ # Python 3.11 environment
	├── eval_py313/ # Python 3.13 environment

Add JSC eval setup #11

Are you sure you want to change the base?

Add JSC eval setup #11

Uh oh!

Conversation

digitsisyph commented Jan 15, 2026

Uh oh!

gemini-code-assist bot commented Jan 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

digitsisyph commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

penfever commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants