Skip to content

Conversation

@kaiming-cheng
Copy link
Contributor

This PR Introduces a hierarchical optimization database that stores GPU kernel optimization techniques and code examples for the RAG-based optimization.

Key components:

  • OptNode / OptHierarchy: Tree structure organizing optimizations by bottleneck type (latency, memory, utilization) → technique → code example
  • docs/: Optimization technique documentation (TMA, PID swizzling, persistence)
  • code_samples/: Reference Triton kernel implementations (matmul, matadd with various optimizations applied)

Optimization techniques covered:

  • Host-side and device-side Tensor Memory Accelerator (TMA)
  • PID swizzling for L2 cache locality
  • Persistent kernel programming style

This database enables the agent to retrieve relevant optimization strategies and reference implementations based on diagnosed performance bottlenecks.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 27, 2026
@kaiming-cheng kaiming-cheng changed the title [Optimization 7/n] Add Database in Kernel_opt [Optimization 7/n] Add Knowledge Database to Kernel optimization Jan 27, 2026
@kaiming-cheng kaiming-cheng force-pushed the kaiming/opt_component_7_clean branch from b9cb0d7 to 84708fd Compare January 28, 2026 00:29
Copy link
Contributor

@Jack-Khuu Jack-Khuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, is it hard to add the integration code using the RAG into this PR too?

Remember to cite for the code_samples/docs

  • Drop [Optimization 7/n] from the title just to avoid confusion

self.opt_children.remove(child)

def add_parents(self, parent_nodes):
"""Adds a child node to the current node."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on this is the same as add_children

Let's also add type hints to the args

"""Adds a child node to the current node."""
self.opt_parents.extend(parent_nodes)

def remove_parents(self, parent_nodes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this for any reason?

Comment on lines +105 to +109
level_1_opts = [optnode_latency, optnode_memory, optnode_utilization]
self.root.add_children(level_1_opts)
optnode_latency.add_parents([self.root])
optnode_memory.add_parents([self.root])
optnode_utilization.add_parents([self.root])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For legibility can we add a helper like add_relation or something that updates the child+parent symmetrically

It's easy to parse here, but level3 is a harder to parse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants