:sparkles: allow single-shard paged attention by joerunde · Pull Request #86 · IBM/text-generation-inference

joerunde · 2024-05-06T20:52:31Z

This is a small little change to allow llama and bigcode models to work with paged attention on a single shard. Currently if FLASH_ATTENTION is not alos set, it will raise

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

tdoublep · 2024-05-08T19:51:04Z

Currently if FLASH_ATTENTION is not alos set, it will raise

Not 100% sure but I think we do actually want FLASH_ATTENTION to be set in addition to PAGED_ATTENTION. I can't remember why exactly..going to look into it.

tdoublep · 2024-05-08T19:53:04Z

server/text_generation_server/inference_engine/tgis_native.py

                raise NotImplementedError(
                    f"Flash attention currently only supported by the following model types: {NONTP_FLASH_TYPES}"
                )
+        elif PAGED_ATTENTION:


I think right now we require both PAGED_ATTENTION and FLASH_ATTENTION to be set, so not sure if this should be elif.

joerunde · 2024-05-09T15:20:50Z

@tdoublep ah, I was assuming that they were mutually exclusive, if they both need to be set then let me know if you find out why!

Sync release to main branches for 2.11

joerunde added 2 commits May 6, 2024 14:51

✨ allow single-shard paged attention

9fc037c

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

Merge branch 'main' into paged-attn

7b76cd0

njhill assigned tdoublep May 8, 2024

tdoublep reviewed May 8, 2024

View reviewed changes

Xaenalt pushed a commit to Xaenalt/text-generation-inference that referenced this pull request Sep 16, 2024

Merge pull request IBM#86 from opendatahub-io/main

d5340ca

Sync release to main branches for 2.11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ allow single-shard paged attention#86

✨ allow single-shard paged attention#86
joerunde wants to merge 2 commits intomainfrom
paged-attn

joerunde commented May 6, 2024

Uh oh!

tdoublep commented May 8, 2024

Uh oh!

tdoublep May 8, 2024

Uh oh!

joerunde commented May 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joerunde commented May 6, 2024

Uh oh!

tdoublep commented May 8, 2024

Uh oh!

tdoublep May 8, 2024

Choose a reason for hiding this comment

Uh oh!

joerunde commented May 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants