By running the sample code , 4096 context + 768 steps, it costs 8min for one question on H20 gpu and about 50G GRAM is occupied.
- 4096 context + 768 steps: 8min + 50G GRAM
- 2048 context + 768 steps: 4min + 31G GRAM
- 768 context + 768 steps: 2min + 20G GRAM