Add Apple Metal support for macOS/Apple Silicon#8
Draft
robtaylor wants to merge 1 commit intoNVlabs:staged-aig-releasefrom
Draft
Add Apple Metal support for macOS/Apple Silicon#8robtaylor wants to merge 1 commit intoNVlabs:staged-aig-releasefrom
robtaylor wants to merge 1 commit intoNVlabs:staged-aig-releasefrom
Conversation
Port GEM RTL simulator from CUDA to Metal for Apple Silicon (M1/M2/M3/M4). Changes: - Add metal feature flag to Cargo.toml with metal-rs dependencies - Update build.rs to compile Metal shaders via xcrun metal/metallib - Create kernel_v1.metal - Metal compute shader translation of CUDA kernel - Translate CUDA __shared__ to Metal threadgroup memory - Replace __shfl_down_sync with simd_shuffle_down - Replace cooperative_groups grid sync with multiple kernel dispatches - Add metal_test.rs binary for running Metal simulation with CPU validation The Metal implementation uses: - Apple Silicon Unified Memory (MTLResourceStorageModeShared) - 256 threads per threadgroup (matching CUDA blocks) - Multiple dispatch calls to replace CUDA grid-wide sync Build with: cargo build --features metal --bin metal_test Test with: cargo run --features metal --bin metal_test -- [args] --check-with-cpu 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds Apple Metal GPU compute support to GEM, enabling simulation on macOS with Apple Silicon (M1/M2/M3/M4) devices.
Changes
GEM Repository:
metalfeature flag inCargo.tomlbuild.rsusingucc::cl_metal()csrc/kernel_v1.metal- Metal compute shader (translated from CUDA)src/bin/metal_test.rs- Metal test binary with CPU validationeda-infra-rs Submodule:
ulibDeviceenum withMetal(u8)variantUVecusing unified memoryMetalBuildandcl_metal()touccfor shader compilation_metalsuffix handling in bindgen for dispatch generationTechnical Approach
__shared__threadgroup__syncthreads()threadgroup_barrier(mem_flags::mem_threadgroup)__shfl_down_sync()simd_shuffle_down()cooperative_groups::this_grid().sync()Grid-wide synchronization is achieved by splitting the kernel at sync points into multiple
dispatchThreadgroupscalls with explicit completion waits between stages.Testing
Tested on Apple M4 Pro with NVDLA design (10,624 cycles):
Usage
Known Limitations
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com