Add integration tests for PyTorch, TGI and TEI DLCs by alvarobartt · Pull Request #79 · huggingface/Google-Cloud-Containers

alvarobartt · 2024-08-30T12:12:57Z

Description

This PR adds some integration tests for the following Hugging Face DLCs on Google Cloud:

TGI only on GPU
TEI on both CPU and GPU
PyTorch Inference on both CPU and GPU
PyTorch Training only on GPU

The tests related to the inference try different alternatives, as well as emulate the Vertex AI environments via the AIP_ environment variables exposed by Vertex AI and handled within the Hugging Face DLCs on Google Cloud for a seamless integration.

As it will be reused within the TGI and TEI tests

Pass args via `text_generation_launcher_kwargs` and include the VertexAI environment mimic via the `AIP_` environment variables.

Which is odd, since `jinja2` is a core dependency of `transformers`, see https://github.com/huggingface/transformers/blob/174890280b340b89c5bfa092f6b4fb0e2dc2d7fc/setup.py#L127

philschmid

Great work. Added some minor comments

.github/workflows/run-tests-action.yml

philschmid · 2024-09-02T12:15:53Z

.github/workflows/test-huggingface-dlcs.yml

+      training-dlc: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.transformers.4-42.ubuntu2204.py310
+      inference-dlc: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cu121.2-2.transformers.4-44.ubuntu2204.py311
+      tgi-dlc: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310


Mhm is there a better way to specify those? Feels like we can easily forget updating them?

tests/pytorch/inference/test_huggingface_inference_toolkit.py

tests/pytorch/training/test_trl.py

tests/requirements.txt

tests/tei/test_tei.py

tests/tgi/test_tgi.py

- Capture `container_uri` from environment variable before running testand remove the default value to prevent issues when testing - Remove `max_train_epochs=-1` as not required since `max_steps` isalready specified - Rename `test_transformers` to `test_huggingface_inference_toolkit` - Remove `transformers` and `jinja2` dependencies as not required, as well as `AutoTokenizer` usage for prompt formatting Co-authored-by: Philipp Schmid <philschmid@users.noreply.github.com>

…ia-smi` Those dependencies where not needed, not actively maintained and adding extra complexity; instead, it has been replaced with `subprocess` running `nvidia-smi`.

- TEI condition on container port was reversed - `gpu_available` raises exception instead of `returncode` if command doesn't exist

In most of the cases, splitting those is for the best and to reduce execution time, assuming we tend to update the DLCs one at a time, so it's not really probable for all the containers to change at once. Pros: easier to manage, more granular, no need for extra `docker pull`s, just runs what's modified Cons: when modifying a bunch of tests it will be slower as `docker pull` needs to be done per each test as instances are ephemeral

The `type: choice` with `options` is only supported for `workflow_dispatch` i.e. when triggering the GitHub Action manually; not via `workflow_call` i.e. when the workflow is just reused from another workflow.

alvarobartt added 30 commits August 26, 2024 15:24

Add tests/local structure

a036a98

Add tests/local/training/test_trl.py (WIP)

beed550

Update tests/local/training/test_trl.py

2427601

Rename tests/local to tests/pytorch

e18b8d5

Add tests/pytorch/inference/test_transformers.py

698613a

Update test_transformers.py

7ce8ec8

Update and rename to test_huggingface_inference_toolkit.py

f00b801

Add tests/requirements.txt

224cbca

Skip tests/pytorch/training if not CUDA_AVAILABLE

dd0cd1f

Handle CUDA_AVAILABLE in tests/pytorch/inference

da1845f

Add docker in tests/requirements.txt

d139796

Remove volumes mounted for local testing

3367f91

Add pytest.init configuration file

dd96f7a

Add .github/actions/pytorch-dlcs-tests.yml

f87f9d2

Add .github/workflows/run-pytorch-dlcs-tests.yml

926960d

Update tests/pytorch/training/test_trl.py (WIP)

e2712ac

Fix tests/pytorch/training/test_trl.py

440a353

Fix tests/pytorch/inference/test_huggingface_inference_toolkit.py

3e3071d

Add background log-streaming via threading

893d046

Move stream_logs to tests/utils.py

e6097d5

As it will be reused within the TGI and TEI tests

Add tests/tgi/test_tgi.py (WIP)

b4edbc3

Add transformers to tests/requirements.txt

b8e3b93

Fix decoding of container.logs()

d5c4c50

Update tests/tgi/test_tgi.py

6ec0dca

Add .github/workflows/run-tgi-dlc-tests.yml

db72a57

Update .github/workflows

82e433a

Update tests/tgi/test_tgi.py

ce31efd

Pass args via `text_generation_launcher_kwargs` and include the VertexAI environment mimic via the `AIP_` environment variables.

Fix decoding of container_logs

09adb69

Use relative imports in tests

19ef319

Add tests/tei

ef0e437

alvarobartt added 3 commits September 2, 2024 09:55

Remove tty and stdin_open interactive mode

4224bc7

Update tmp_path with --basetmp (debug)

beef705

Fix TGI_DLC environment variable value

9446a3e

alvarobartt force-pushed the add-integration-tests branch from 157ab15 to 9446a3e Compare September 2, 2024 09:22

alvarobartt added 3 commits September 2, 2024 12:07

Check container.status to prevent extra healtchecks

99d353c

Add nvidia-ml-py to set USE_FLASH_ATTENTION based on compute cap

c99e0ed

Add jinja2 dependency in tests/requirements.txt

4212a58

Which is odd, since `jinja2` is a core dependency of `transformers`, see https://github.com/huggingface/transformers/blob/174890280b340b89c5bfa092f6b4fb0e2dc2d7fc/setup.py#L127

alvarobartt changed the title ~~[TESTS] Add some integration tests (WIP)~~ Add integration tests for PyTorch, TGI and TEI DLCs Sep 2, 2024

alvarobartt requested a review from philschmid September 2, 2024 11:20

alvarobartt added 2 commits September 2, 2024 13:35

Update trigger in .github/workflows/test-huggingface-dlcs.yml

3909567

Merge branch 'main' into add-integration-tests

7c4bf87

philschmid reviewed Sep 2, 2024

View reviewed changes

alvarobartt force-pushed the add-integration-tests branch from 3af2bcf to 7ce5aeb Compare September 2, 2024 13:37

Add missing tei-dlc after removing defaults

349df29

alvarobartt force-pushed the add-integration-tests branch from 6eb06b5 to 349df29 Compare September 2, 2024 13:46

alvarobartt added 14 commits September 3, 2024 09:20

Remove GPUtil and nvidia-ml-py in favour of subprocess on `nvid…

eeb711d

…ia-smi` Those dependencies where not needed, not actively maintained and adding extra complexity; instead, it has been replaced with `subprocess` running `nvidia-smi`.

Fix integration tests

6b55963

- TEI condition on container port was reversed - `gpu_available` raises exception instead of `returncode` if command doesn't exist

Rename run-tests-action.yml to run-tests-reusable.yml

35bc4d8

Add options and update name in run-tests-reusable.yml

b71a392

Set type: choice to use options

d654b94

Update name for test-pytorch-{inference,training}-dlcs.yml

0fc8ef5

Fix .github/workflows/run-tests-reusable.yml

34281bb

The `type: choice` with `options` is only supported for `workflow_dispatch` i.e. when triggering the GitHub Action manually; not via `workflow_call` i.e. when the workflow is just reused from another workflow.

Add missing type: ignore

4768af1

Update tei-dlc on CPU and update port mapping

9f6dcc0

Merge branch 'main' into add-integration-tests

aa9397d

Fix run-tests-reusable.yml

6ba1610

Update DLC URIs on *.yml actions for tests

d4bb73f

Fix PyTorch Training DLC URI

e784828

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration tests for PyTorch, TGI and TEI DLCs#79

Add integration tests for PyTorch, TGI and TEI DLCs#79
alvarobartt wants to merge 86 commits intomainfrom
add-integration-tests

alvarobartt commented Aug 30, 2024 •

edited

Loading

Uh oh!

philschmid left a comment

Uh oh!

Uh oh!

philschmid Sep 2, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alvarobartt commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

philschmid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

philschmid Sep 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alvarobartt commented Aug 30, 2024 •

edited

Loading