Skip to content

The performance is very unefficient #3

@ynicle

Description

@ynicle

By running the sample code , 4096 context + 768 steps, it costs 8min for one question on H20 gpu and about 50G GRAM is occupied.

  • 4096 context + 768 steps: 8min + 50G GRAM
  • 2048 context + 768 steps: 4min + 31G GRAM
  • 768 context + 768 steps: 2min + 20G GRAM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions