-
Notifications
You must be signed in to change notification settings - Fork 12
Add JSC eval setup #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @digitsisyph, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational setup for conducting SWE-bench evaluations using the Harbor framework on JSC supercomputers. It provides all necessary documentation, configuration files, and an automated SLURM script to streamline the process from environment preparation to running benchmarks and optionally uploading results, ensuring a robust and reproducible evaluation pipeline. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Hi @neginraoof , this is the setup and code for jsc eval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a setup for running evaluations on JSC supercomputers. The changes include a detailed README guide, a SLURM batch script, and a Harbor configuration file. My review focuses on ensuring the setup instructions are correct and the scripts are robust. I've found a critical inconsistency between the Python version specified in the setup guide (3.11) and the one required by the evaluation script (3.13), which will cause issues for users. I've provided suggestions to align the documentation with the script's requirements. Additionally, I've identified a critical bug in the SLURM script's logic for counting errors, which would prevent the auto-upload skipping feature from working correctly. I've also included suggestions to improve the script's robustness and clarity by refining how it locates dependencies and manages environment modules. Overall, the contribution is valuable, and addressing these points will significantly improve its reliability and user experience.
| ### 4. Create Python 3.11 Environment | ||
| ```bash | ||
| eval "$(${MAMBA_INSTALL}/bin/conda shell.bash hook)" | ||
| export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py311" | ||
| ${MAMBA_INSTALL}/bin/mamba create --prefix ${EVAL_ENV} python=3.11 -c conda-forge --override-channels -y | ||
| source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setup instructions guide the user to create a Python 3.11 environment, but the jsc_eval_harbor.sbatch script is configured for and recommends Python 3.13. This inconsistency will cause errors and confusion. To fix this, the instructions should be updated to create a Python 3.13 environment.
| ### 4. Create Python 3.11 Environment | |
| ```bash | |
| eval "$(${MAMBA_INSTALL}/bin/conda shell.bash hook)" | |
| export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py311" | |
| ${MAMBA_INSTALL}/bin/mamba create --prefix ${EVAL_ENV} python=3.11 -c conda-forge --override-channels -y | |
| source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV} | |
| ### 4. Create Python 3.13 Environment | |
| ```bash | |
| eval "$(${MAMBA_INSTALL}/bin/conda shell.bash hook)" | |
| export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py313" | |
| ${MAMBA_INSTALL}/bin/mamba create --prefix ${EVAL_ENV} python=3.13 -c conda-forge --override-channels -y | |
| source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV} |
| export MAMBA_INSTALL="/p/project1/ccstdl/${USER}/mamba" | ||
| export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py311" | ||
| alias activate_eval='source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To maintain consistency with the required Python 3.13 environment, the EVAL_ENV variable configured in .bashrc should point to the eval_py313 directory.
| export MAMBA_INSTALL="/p/project1/ccstdl/${USER}/mamba" | |
| export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py311" | |
| alias activate_eval='source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV}' | |
| export MAMBA_INSTALL="/p/project1/ccstdl/${USER}/mamba" | |
| export EVAL_ENV="/p/project1/ccstdl/${USER}/eval_py313" | |
| alias activate_eval='source ${MAMBA_INSTALL}/bin/activate ${EVAL_ENV}' |
| DAYTONA_COUNT=$(python3 -c " | ||
| import json | ||
| import sys | ||
|
|
||
| try: | ||
| with open('$RESULT_FILE', 'r') as f: | ||
| data = json.load(f) | ||
|
|
||
| total_errors = 0 | ||
| error_ids = [] | ||
|
|
||
| if 'stats' in data and 'evals' in data['stats']: | ||
| for eval_key, eval_data in data['stats']['evals'].items(): | ||
| if 'exception_stats' in eval_data and 'DaytonaError' in eval_data['exception_stats']: | ||
| daytona_errors = eval_data['exception_stats']['DaytonaError'] | ||
| if isinstance(daytona_errors, list): | ||
| total_errors += len(daytona_errors) | ||
| error_ids.extend(daytona_errors) | ||
|
|
||
| print(total_errors) | ||
|
|
||
| # Also write detailed info to stderr for logging | ||
| if total_errors > 0: | ||
| print(f'Found {total_errors} DaytonaError(s): {error_ids[:5]}...', file=sys.stderr) | ||
|
|
||
| except Exception as e: | ||
| print(f'Error parsing result.json: {e}', file=sys.stderr) | ||
| print('0') | ||
| " 2>&1 | tail -n 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command used to capture DAYTONA_COUNT is buggy. By redirecting stderr to stdout (2>&1) and taking the last line (tail -n 1), you are capturing a log message from stderr instead of the actual count printed to stdout. This will cause the subsequent integer comparison to fail.
The Python script correctly prints only the count to stdout. You should capture stdout directly by removing the pipe.
DAYTONA_COUNT=$(python3 -c "
import json
import sys
try:
with open('$RESULT_FILE', 'r') as f:
data = json.load(f)
total_errors = 0
error_ids = []
if 'stats' in data and 'evals' in data['stats']:
for eval_key, eval_data in data['stats']['evals'].items():
if 'exception_stats' in eval_data and 'DaytonaError' in eval_data['exception_stats']:
daytona_errors = eval_data['exception_stats']['DaytonaError']
if isinstance(daytona_errors, list):
total_errors += len(daytona_errors)
error_ids.extend(daytona_errors)
print(total_errors)
# Also write detailed info to stderr for logging
if total_errors > 0:
print(f'Found {total_errors} DaytonaError(s): {error_ids[:5]}...', file=sys.stderr)
except Exception as e:
print(f'Error parsing result.json: {e}', file=sys.stderr)
print('0')
")| ``` | ||
| /p/project1/ccstdl/<username>/ | ||
| ├── mamba/ # Miniforge installation | ||
| ├── eval_py311/ # Python 3.11 environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Option 1: Use latest 2024 stack | ||
| ml Stages/2024 | ||
| ml StdEnv/2024 | ||
| ml Python/3.11.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script loads the Python/3.11.3 module but then activates a Conda environment with Python 3.13. This is confusing and potentially problematic if there are conflicting libraries. Since the Conda environment provides the Python interpreter and its dependencies, loading the system Python module is likely unnecessary and could be removed to avoid confusion and potential conflicts.
| SNAPSHOT_SCRIPT="${SLURM_SUBMIT_DIR:-$(pwd)}/snapshot_download.py" | ||
| if [ ! -f "$SNAPSHOT_SCRIPT" ]; then | ||
| # Try parent directory (tacc/) | ||
| SNAPSHOT_SCRIPT="${SLURM_SUBMIT_DIR:-$(pwd)}/../tacc/snapshot_download.py" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locating snapshot_download.py relative to SLURM_SUBMIT_DIR is fragile because it depends on the directory where sbatch is executed. A more robust approach is to define the path relative to this script's own location. This makes the dependency on the tacc directory's script explicit and reliable, regardless of the submission directory.
| SNAPSHOT_SCRIPT="${SLURM_SUBMIT_DIR:-$(pwd)}/snapshot_download.py" | |
| if [ ! -f "$SNAPSHOT_SCRIPT" ]; then | |
| # Try parent directory (tacc/) | |
| SNAPSHOT_SCRIPT="${SLURM_SUBMIT_DIR:-$(pwd)}/../tacc/snapshot_download.py" | |
| fi | |
| # Use the snapshot_download.py script from the sibling tacc directory | |
| SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) | |
| SNAPSHOT_SCRIPT="${SCRIPT_DIR}/../tacc/snapshot_download.py" |
|
@digitsisyph Please resolve the outstanding comments and request a review to keep this moving! |
Add JSC eval setup