Add Apple Metal support for macOS/Apple Silicon by robtaylor · Pull Request #8 · NVlabs/GEM

robtaylor · 2026-01-05T14:25:52Z

Summary

This PR adds Apple Metal GPU compute support to GEM, enabling simulation on macOS with Apple Silicon (M1/M2/M3/M4) devices.

Changes

GEM Repository:

Add metal feature flag in Cargo.toml
Add Metal shader compilation in build.rs using ucc::cl_metal()
Add csrc/kernel_v1.metal - Metal compute shader (translated from CUDA)
Add src/bin/metal_test.rs - Metal test binary with CPU validation

eda-infra-rs Submodule:

Add Metal feature and dependencies to ulib
Extend Device enum with Metal(u8) variant
Add Metal buffer support to UVec using unified memory
Add MetalBuild and cl_metal() to ucc for shader compilation
Add _metal suffix handling in bindgen for dispatch generation

Technical Approach

CUDA	Metal
`__shared__`	`threadgroup`
`__syncthreads()`	`threadgroup_barrier(mem_flags::mem_threadgroup)`
`__shfl_down_sync()`	`simd_shuffle_down()`
`cooperative_groups::this_grid().sync()`	Multiple kernel dispatches

Grid-wide synchronization is achieved by splitting the kernel at sync points into multiple dispatchThreadgroups calls with explicit completion waits between stages.

Testing

Tested on Apple M4 Pro with NVDLA design (10,624 cycles):

Metal simulation: 1.51 seconds
CPU validation: ✅ PASSED (bit-identical to CPU reference)

Usage

# Build with Metal support
cargo build --release --features metal --bin metal_test

# Run simulation
cargo run --release --features metal --bin metal_test -- \
    path/to/design.gv \
    path/to/design.gemparts \
    path/to/input.vcd \
    path/to/output.vcd \
    NUM_BLOCKS \
    --check-with-cpu

Known Limitations

Output VCD timing is 1 clock cycle offset from Verilator (same as CUDA backend) - investigation ongoing

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

Port GEM RTL simulator from CUDA to Metal for Apple Silicon (M1/M2/M3/M4). Changes: - Add metal feature flag to Cargo.toml with metal-rs dependencies - Update build.rs to compile Metal shaders via xcrun metal/metallib - Create kernel_v1.metal - Metal compute shader translation of CUDA kernel - Translate CUDA __shared__ to Metal threadgroup memory - Replace __shfl_down_sync with simd_shuffle_down - Replace cooperative_groups grid sync with multiple kernel dispatches - Add metal_test.rs binary for running Metal simulation with CPU validation The Metal implementation uses: - Apple Silicon Unified Memory (MTLResourceStorageModeShared) - 256 threads per threadgroup (matching CUDA blocks) - Multiple dispatch calls to replace CUDA grid-wide sync Build with: cargo build --features metal --bin metal_test Test with: cargo run --features metal --bin metal_test -- [args] --check-with-cpu 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

robtaylor mentioned this pull request Jan 5, 2026

Add Apple Metal support gzz2000/eda-infra-rs#1

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Apple Metal support for macOS/Apple Silicon#8

Add Apple Metal support for macOS/Apple Silicon#8
robtaylor wants to merge 1 commit intoNVlabs:staged-aig-releasefrom
ChipFlow:feature/metal-support

robtaylor commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

robtaylor commented Jan 5, 2026

Summary

Changes

Technical Approach

Testing

Usage

Known Limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant