Add Docker support for annotation pipeline scripts#57
Conversation
Co-authored-by: neliebi <51783034+neliebi@users.noreply.github.com>
Co-authored-by: neliebi <51783034+neliebi@users.noreply.github.com>
Co-authored-by: neliebi <51783034+neliebi@users.noreply.github.com>
Co-authored-by: neliebi <51783034+neliebi@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds containerization support for running the annotation pipeline scripts in a consistent environment, including a Docker image, docker-specific config template, and Docker usage documentation.
Changes:
- Added a
Dockerfilethat installs Python dependencies and BLAST+ into a runnable container image. - Added
.dockerignorerules to keep the Docker build context small. - Added
config.ini.docker-exampleplus README documentation describing Docker build/run and volume-mount patterns.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
Dockerfile |
Builds a Python 3.8-slim image with system deps (BLAST+, libxml2/libxslt) and installs Python requirements. |
.dockerignore |
Excludes git metadata, data/log/test artifacts, and most ini files from the image build context. |
config.ini.docker-example |
Example config showing container paths (e.g., /data/...) intended for volume mounts. |
README.md |
Adds Docker installation/usage guidance and updates config examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| docker run --rm \ | ||
| -v /path/on/host/config.ini:/config/config.ini:ro \ | ||
| -v /path/on/host/data:/data:ro \ | ||
| -v /path/on/host/output:/output \ | ||
| added-annotations python <script_name.py> <arguments> |
There was a problem hiding this comment.
The Docker run examples mount the config to /config/config.ini, but the scripts currently look for config.ini next to the Python files (e.g., /app/config.ini in the container) and don’t accept a config path argument. Either adjust the mount target in the docs to /app/config.ini (or -w /app), or update the container/scripts to honor CONFIG_PATH so the documented mount path works.
| ##### Generate Europe PMC Links | ||
| ```bash | ||
| docker run --rm \ | ||
| -v /path/on/host/config.ini:/config/config.ini:ro \ | ||
| -v /path/on/host/output:/output \ | ||
| added-annotations python generate_eupmc_links.py | ||
| ``` |
There was a problem hiding this comment.
The README suggests running generate_eupmc_links.py in Docker without any volume mounts for its input/output files, but the script currently uses hard-coded host paths for input_tsv and output_dir (not container paths). This command is likely to fail in the container unless those paths exist. Consider updating the docs to describe the required mounts/paths (or updating the script to accept CLI args/config for input/output locations).
| ##### Compare Release | ||
| ```bash | ||
| docker run --rm \ | ||
| -v /path/on/host/config.ini:/config/config.ini:ro \ | ||
| -v /path/on/host/latest:/latest:ro \ | ||
| -v /path/on/host/previous:/previous:ro \ | ||
| added-annotations python compare_release.py /latest /previous | ||
| ``` |
There was a problem hiding this comment.
The compare_release.py Docker example doesn’t account for runtime dependencies the container currently lacks (e.g., a mail binary) and required config sections (the script reads DB credentials from a [db] section in config.ini). As written, this documented container invocation is likely to fail unless the image is extended and the config template includes the DB fields.
| | File | Descritption | Download Link | | ||
| | File | Description | Download Link | | ||
| |-------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | uniprot.tsv | UniProt annpotations | https://rest.uniprot.org/uniprotkb/stream?fields=accession,xref_pdb,protein_name&query=((database:pdb))&format=tsv&compressed=false | |
There was a problem hiding this comment.
Typo in the table: "UniProt annpotations" should be "UniProt annotations".
| | uniprot.tsv | UniProt annpotations | https://rest.uniprot.org/uniprotkb/stream?fields=accession,xref_pdb,protein_name&query=((database:pdb))&format=tsv&compressed=false | | |
| | uniprot.tsv | UniProt annotations | https://rest.uniprot.org/uniprotkb/stream?fields=accession,xref_pdb,protein_name&query=((database:pdb))&format=tsv&compressed=false | |
| ENV CONFIG_PATH=/config/config.ini | ||
|
|
||
| # Default command shows help | ||
| CMD ["python", "AddedAnnotations.py", "--help"] |
There was a problem hiding this comment.
The container sets CONFIG_PATH=/config/config.ini, but the scripts currently load config.ini from the application directory (e.g., Path(__file__).parent / "config.ini") and do not read CONFIG_PATH. As written, mounting a config at /config/config.ini won’t be picked up. Consider either (a) copying/symlinking /config/config.ini to /app/config.ini in the image/entrypoint, or (b) updating the scripts to read the config path from CONFIG_PATH (and documenting that).
| CMD ["python", "AddedAnnotations.py", "--help"] | |
| CMD ["sh", "-c", "if [ -f \"${CONFIG_PATH}\" ] && [ ! -e \"/app/config.ini\" ]; then ln -s \"${CONFIG_PATH}\" \"/app/config.ini\"; fi; exec python AddedAnnotations.py --help"] |
Containerizes the annotation pipeline (AddedAnnotations.py, fetch_empiar.py, fetch_pubmed.py, fetch_afdb.py, generate_eupmc_links.py, compare_release.py) to simplify dependency management and ensure consistent execution environments.
Changes
/data/...) for volume mountsUsage Pattern
docker build -t added-annotations . docker run --rm \ -v /host/config.ini:/config/config.ini:ro \ -v /host/data:/data:ro \ -v /host/output:/output \ added-annotations python AddedAnnotations.py -w /output -f /emdb_metadata --all -t 4Config file uses container paths that map to volume mounts:
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.