Skip to content

Add Docker support for annotation pipeline scripts#57

Merged
neliebi merged 5 commits intomainfrom
copilot/create-dockerfile-for-scripts
Feb 18, 2026
Merged

Add Docker support for annotation pipeline scripts#57
neliebi merged 5 commits intomainfrom
copilot/create-dockerfile-for-scripts

Conversation

Copy link
Contributor

Copilot AI commented Feb 18, 2026

Containerizes the annotation pipeline (AddedAnnotations.py, fetch_empiar.py, fetch_pubmed.py, fetch_afdb.py, generate_eupmc_links.py, compare_release.py) to simplify dependency management and ensure consistent execution environments.

Changes

  • Dockerfile: Python 3.8-slim base with BLAST+ (ncbi-blast+) and required system libraries (libxml2, libxslt)
  • .dockerignore: Excludes git metadata, test files, logs, and data files to reduce image size
  • config.ini.docker-example: Template showing container path structure (/data/...) for volume mounts
  • README.md: Added Docker sections with volume mounting patterns using read-only flags for data/config protection

Usage Pattern

docker build -t added-annotations .

docker run --rm \
  -v /host/config.ini:/config/config.ini:ro \
  -v /host/data:/data:ro \
  -v /host/output:/output \
  added-annotations python AddedAnnotations.py -w /output -f /emdb_metadata --all -t 4

Config file uses container paths that map to volume mounts:

[file_paths]
BLAST_DB = /data/uniprotkb_swissprot
components_cif = /data/components.cif
uniprot_tab = /data/uniprot.tsv

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 4 commits February 18, 2026 12:25
Co-authored-by: neliebi <51783034+neliebi@users.noreply.github.com>
Co-authored-by: neliebi <51783034+neliebi@users.noreply.github.com>
Co-authored-by: neliebi <51783034+neliebi@users.noreply.github.com>
Co-authored-by: neliebi <51783034+neliebi@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Dockerfile to run multiple Python scripts Add Docker support for annotation pipeline scripts Feb 18, 2026
Copilot AI requested a review from neliebi February 18, 2026 12:30
@neliebi neliebi marked this pull request as ready for review February 18, 2026 15:29
Copilot AI review requested due to automatic review settings February 18, 2026 15:29
@neliebi neliebi merged commit 2ad1552 into main Feb 18, 2026
1 check passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds containerization support for running the annotation pipeline scripts in a consistent environment, including a Docker image, docker-specific config template, and Docker usage documentation.

Changes:

  • Added a Dockerfile that installs Python dependencies and BLAST+ into a runnable container image.
  • Added .dockerignore rules to keep the Docker build context small.
  • Added config.ini.docker-example plus README documentation describing Docker build/run and volume-mount patterns.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
Dockerfile Builds a Python 3.8-slim image with system deps (BLAST+, libxml2/libxslt) and installs Python requirements.
.dockerignore Excludes git metadata, data/log/test artifacts, and most ini files from the image build context.
config.ini.docker-example Example config showing container paths (e.g., /data/...) intended for volume mounts.
README.md Adds Docker installation/usage guidance and updates config examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +145 to +149
docker run --rm \
-v /path/on/host/config.ini:/config/config.ini:ro \
-v /path/on/host/data:/data:ro \
-v /path/on/host/output:/output \
added-annotations python <script_name.py> <arguments>
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker run examples mount the config to /config/config.ini, but the scripts currently look for config.ini next to the Python files (e.g., /app/config.ini in the container) and don’t accept a config path argument. Either adjust the mount target in the docs to /app/config.ini (or -w /app), or update the container/scripts to honor CONFIG_PATH so the documented mount path works.

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +210
##### Generate Europe PMC Links
```bash
docker run --rm \
-v /path/on/host/config.ini:/config/config.ini:ro \
-v /path/on/host/output:/output \
added-annotations python generate_eupmc_links.py
```
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README suggests running generate_eupmc_links.py in Docker without any volume mounts for its input/output files, but the script currently uses hard-coded host paths for input_tsv and output_dir (not container paths). This command is likely to fail in the container unless those paths exist. Consider updating the docs to describe the required mounts/paths (or updating the script to accept CLI args/config for input/output locations).

Copilot uses AI. Check for mistakes.
Comment on lines +212 to +219
##### Compare Release
```bash
docker run --rm \
-v /path/on/host/config.ini:/config/config.ini:ro \
-v /path/on/host/latest:/latest:ro \
-v /path/on/host/previous:/previous:ro \
added-annotations python compare_release.py /latest /previous
```
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compare_release.py Docker example doesn’t account for runtime dependencies the container currently lacks (e.g., a mail binary) and required config sections (the script reads DB credentials from a [db] section in config.ini). As written, this documented container invocation is likely to fail unless the image is extended and the config template includes the DB fields.

Copilot uses AI. Check for mistakes.
| File | Descritption | Download Link |
| File | Description | Download Link |
|-------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| uniprot.tsv | UniProt annpotations | https://rest.uniprot.org/uniprotkb/stream?fields=accession,xref_pdb,protein_name&query=((database:pdb))&format=tsv&compressed=false |
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the table: "UniProt annpotations" should be "UniProt annotations".

Suggested change
| uniprot.tsv | UniProt annpotations | https://rest.uniprot.org/uniprotkb/stream?fields=accession,xref_pdb,protein_name&query=((database:pdb))&format=tsv&compressed=false |
| uniprot.tsv | UniProt annotations | https://rest.uniprot.org/uniprotkb/stream?fields=accession,xref_pdb,protein_name&query=((database:pdb))&format=tsv&compressed=false |

Copilot uses AI. Check for mistakes.
ENV CONFIG_PATH=/config/config.ini

# Default command shows help
CMD ["python", "AddedAnnotations.py", "--help"]
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container sets CONFIG_PATH=/config/config.ini, but the scripts currently load config.ini from the application directory (e.g., Path(__file__).parent / "config.ini") and do not read CONFIG_PATH. As written, mounting a config at /config/config.ini won’t be picked up. Consider either (a) copying/symlinking /config/config.ini to /app/config.ini in the image/entrypoint, or (b) updating the scripts to read the config path from CONFIG_PATH (and documenting that).

Suggested change
CMD ["python", "AddedAnnotations.py", "--help"]
CMD ["sh", "-c", "if [ -f \"${CONFIG_PATH}\" ] && [ ! -e \"/app/config.ini\" ]; then ln -s \"${CONFIG_PATH}\" \"/app/config.ini\"; fi; exec python AddedAnnotations.py --help"]

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants