Conversation
Dockerfile
Outdated
| @@ -0,0 +1,18 @@ | |||
| FROM ubuntu:latest | |||
There was a problem hiding this comment.
Any reason to use an ubuntu image rather than python:3-alpine so you don't have to install python and the image can be smaller?
There was a problem hiding this comment.
The function pthread_attr_setaffinity_np is not on alpine linux and needed for pytorch. I am not sure if there is a slimmer distro than ubuntu. Open to try something if you have any suggestions.
There was a problem hiding this comment.
I'd prefer we use ubuntu and maybe even package the model weights in the dockerfile (you're going to have to download it anyway, and we pin the revision in docquery).
There was a problem hiding this comment.
On my local when using it to develop I mount my machines huggingface cache. But I think as a general easy to use docker container that would simplify things and be respectful of huggingfaces resources.
There was a problem hiding this comment.
Do we want to build from source than? Or should I just pip install docquery?
There was a problem hiding this comment.
I think it depends on how we want to publish it. We should probably get a better release cadence going and pin the Dockerfile to whatever the latest release is. I think the way to do that would be to somehow provide the version as a parameter to the Dockerfile and populate the parameter by looking at https://github.com/impira/docquery/blob/main/src/docquery/version.py.
There was a problem hiding this comment.
docker run IMAGE_NAME docquery scan "What is the invoice number?" https://templates.invoicehome.com/invoice-template-us-neat-750px.png
Seems like a really nice way to use the tool.
Dockerfile
Outdated
| FROM ubuntu:latest | ||
|
|
||
| RUN apt-get update \ | ||
| && apt-get install -y python3-pip python3-dev \ |
There was a problem hiding this comment.
does this need to be specific about which python3 version to use? I think some of the dependencies here require > 3.6.
RUN apt-get update && apt-get install -y python3.9 python3.9-dev python3-pipThere was a problem hiding this comment.
I switched to python:3.10-slim-bullseye. So we will be using 3.10.
|
Ideally 3.10
…--
Ankur Goyal
CEO, Impira
Make Meaningful
Sent via Superhuman iOS ( ***@***.*** )
On Sat, Sep 10 2022 at 10:09 AM, npappenhagen < ***@***.*** > wrote:
***@***.**** commented on this pull request.
In Dockerfile (
#12 (comment) ) :
> @@ -0,0 +1,18 @@
+FROM ubuntu:latest
+
+RUN apt-get update \
+ &&
apt-get install -y python3-pip python3-dev \
does this need to be specific about which python3 version to use? I think
some of the dependencies here require > 3.6.
RUN apt-get update && apt-get install -y python3.9 python3.9-dev
python3-pip
—
Reply to this email directly, view it on GitHub (
#12 (review) ) ,
or unsubscribe (
https://github.com/notifications/unsubscribe-auth/AAEKA47Y6QTY5O7Z5DRLLNDV5S6EPANCNFSM6AAAAAAQDXWIFY
).
You are receiving this because you are subscribed to this thread. Message
ID: <impira/docquery/pull/12/review/1103155127 @ github. com>
|
|
@amazingvince how does the scan command work from the docker container? Specifically, if you run something like |
|
Yeah but what happens if you point to a file on your local filesystem? |
|
@ankrgyl It would be something like |
ankrgyl
left a comment
There was a problem hiding this comment.
Oh cool, that looks good. I think as long as we document it that works. This is very exciting, and I think we are in the homestretch.
To land this we need to sort out a few more things:
- Add a command to make, like
make docker, that builds the Dockerfile (with the appropriate container/tag names) - Add documentation to the README that shows how to run the scan command and how to add local files
- Add something to the tests (maybe a sanity test that the container can be built, or a mode that builds the container and then runs the tests inside of it)
I've also filed some follow ups: #27 and #28. We can address these in follow ups.
| COPY ["src/", "./src"] | ||
|
|
||
| RUN pip install . | ||
| CMD ["python3"] |
There was a problem hiding this comment.
can we make the default entrypoint python3 -m docquery.cmd (or just docquery)?
| COPY ["README.md", "pyproject.toml", "setup.py", "./"] | ||
| COPY ["src/", "./src"] | ||
|
|
||
| RUN pip install . |
There was a problem hiding this comment.
I think we should either pip install .[all] or a select list of extensions, e.g. [donut] (I'm currently working on adding [web] which will contain extras for web scraping).
There was a problem hiding this comment.
We should also manually install transformers (the same version that's suggested in the README)
No description provided.