Skip to content

[RFC] Add support for megatron backend in Ascend NPU #337

@jiaqiw09

Description

@jiaqiw09

This issue proposes adding optional Ascend NPU support to ROLL by integrating MindSpeed, which is essentially a toolkit to run megatron on ascend npu.

Motivation

On Ascend NPU, MindSpeed provides Megatron-compatible patches and runtime extensions.
ROLL currently does not expose a clean way to apply these patches or pass the required configuration, which makes Ascend execution difficult.

High-level approach

  • Apply MindSpeed patches before Megatron modules are initialized
  • Expose a small set of MindSpeed-required configuration fields at the config layer
  • Allow MindSpeed-related arguments to pass through existing ROLL configs
  • Keep all behavior opt-in and limited to Ascend NPU

Non-goals

  • No behavior changes for CUDA / CPU
  • No MindSpeed hard dependency
  • No architectural changes to Megatron

Environment

  • Target: Ascend NPU
  • MindSpeed version: 0.14.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions