llama_decode_eagle latency issue solved by mkjsym · Pull Request #5 · SKKU-ESLAB/specinfer.cpp

mkjsym · 2025-07-02T12:48:43Z

Make sure to read the contributing guidelines before submitting a PR

llama_decode_eagle 함수의 latency issue를 해결하였습니다.

Copilot

Pull Request Overview

This PR adds support for an EAGLE input layer to address latency issues in the llama_decode_eagle function and updates related tensor loading, normalization, and RoPE settings. Key changes include:

Introduce LLM_TENSOR_LAYER_INPUT_EAGLE and map it in tensor loading and architecture definitions.
Comment out the redundant RMS normalization step in llm_build_eagle to reduce overhead.
Adjust RoPE type mapping for the EAGLE architecture and update example integration under examples/speculative-eagle.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/llama-model.cpp	Added handling for `LLM_TENSOR_LAYER_INPUT_EAGLE` and skipped norm.
src/llama-arch.h	Defined new enum value `LLM_TENSOR_LAYER_INPUT_EAGLE`.
src/llama-arch.cpp	Updated tensor info for `LLM_TENSOR_EMBD_FC` to use the new layer.
examples/speculative-eagle/*	Added complete example and build integration for EAGLE decoding.
examples/CMakeLists.txt	Registered the new `speculative-eagle` example in the build.

Comments suppressed due to low confidence (4)

src/llama-arch.h:378

The new enum value lacks a descriptive comment; add a brief explanation for LLM_TENSOR_LAYER_INPUT_EAGLE so its purpose is clear to future maintainers.

    LLM_TENSOR_LAYER_INPUT_EAGLE,

src/llama-arch.h:374

New tensor layer functionality for EAGLE is not covered by existing tests; add unit tests for load_tensors and the EAGLE decode path to ensure correctness.

enum llm_tensor_layer {

examples/speculative-eagle/speculative-eagle.cpp:1

[nitpick] Comments are written in Korean and English; for consistency and to accommodate a global contributor base, translate or unify comments in English.

//Tree-based EAGLE 구현 코드

mkjsym added 4 commits June 27, 2025 22:16

static tree-based eagle

110dc25

llama-model.cpp 설정 오류 수정..

a8f1065

warnings removed

f2facfc

llama_decode_eagle latency issue solved

ffc8248

github-actions bot added the examples label Jul 2, 2025

LeeHayun requested a review from Copilot July 2, 2025 12:49

Copilot AI reviewed Jul 2, 2025

View reviewed changes

Merge branch 'master' into master

81966a3

LeeHayun merged commit c288e68 into SKKU-ESLAB:master Jul 2, 2025
10 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama_decode_eagle latency issue solved#5

llama_decode_eagle latency issue solved#5
LeeHayun merged 5 commits intoSKKU-ESLAB:masterfrom
mkjsym:master

mkjsym commented Jul 2, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mkjsym commented Jul 2, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants