Skip to content

Why is the l2_loop_order and l1_loop_order in Heuristic-GPU set to knm? #12

@Parsifal1986

Description

@Parsifal1986

Hi, thanks for your great work on this project!

I have a question regarding the Heuristic-GPU exploring configuration in matmul.py.
I noticed that the loop_order is set to knm, and I’m trying to understand the reasoning behind this choice.

In Nvidia CUTLASS Document, the k dimension is in the inner loop while in you code it is set to outer loop.

Is there a specific performance consideration or hardware constraint that makes the knm order preferable? Or you just run an exhaustive test to find that knm is the best choice on typical GPU architectures?

I would really appreciate it if you could provide some insights or point me to relevant documentation.
Thanks again for maintaining this project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions