-
Notifications
You must be signed in to change notification settings - Fork 278
Declare index as Int so it can be used in Int - Int operations #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gdies
wants to merge
79
commits into
modular:main
Choose a base branch
from
gdies:int-uint-operations
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+8
−8
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update platform name and note on puzzle 9 and 10 nvidia-only
modular#86) dtype can be passed form python to mojo as a parameter in the same way as in attention.mojo and TPB can be removed because its use in determining the block dimensions for both layernorm_kernel and add_bias_kernel unecessarily restricts the puzzle to using values of HIDDEN_DIM and OUTPUT_DIM <= TPB.
* revise p09 formatting for consistency * renane kernal parameters for consistency * revise p09 formatting for consistency * explain buffer_cache_size cmd
…allocation in benchmarking (modular#93)
* revise p09 formatting for consistency * renane kernal parameters for consistency * revise p09 formatting for consistency * explain buffer_cache_size cmd * fix p11 typo * define dot product in p12 * revise p14 cmd to align with puzzle * revise p14 cmd argument to complete * Note accurate dot color in p16 visualization * Revise p16 text to prevent overlap * define softmax in p18 * Revise phrasing in p18 * Add filename to code completion steps in p18 * Add filename to code completion step in p17 * Fix p19 typo
* revise p14 cmd to align with puzzle * revise p14 cmd argument to complete * Update arguments for p23 BenchConfig * Revise p24 SIMD comment for clarity * Tweak p26 formatting * Correct file path of p26 solution code snippets * Correct file path of p29 solution code snippets
…lization which is currently dominating the results. (modular#102) Updated kernels to write to different memory locations to avoid race condition and allow testing (previously functional benchmarking was only running a single warp)
* revise p14 cmd to align with puzzle * revise p14 cmd argument to complete * Revise p27 code organization * Define Stencil operation * Define SAXPY
Added a SECURITY.md file with content based on the relevant Modular web page.
…odular#127) * Begin migration of enqueue_function to enqueue_function_checked. * Fix typo. * Update to MutAnyOrigin, ImmutAnyOrigin. * Formatting fix.
* Fix DeviceContext enqueue_fill() no longer returning self This is a change due to modular/modular@ce7e4d6#diff-e53a900e59316a16c5793137123d7dc10021feb176cdcc3987153ca8be53f7b8 Updated all locations of enqueue_fill() Before: ``` var output_buffer = ctx.enqueue_create_buffer[DType.int]( buffer_size ).enqueue_fill(9) ``` After: ``` var output_buffer = ctx.enqueue_create_buffer[DType.int](buffer_size) output_buffer.enqueue_fill(9) ``` * Fix Typo
* Migrate p13 and p14. * Migrate p15 and p16 to checked functions.
…tion_checked` (modular#140) * Migrate p18 and p22 to enqueue_function_checked. * Migrate p25 and p26 to enqueue_function_checked. * Migrating p09, p10, and p19 to enqueue_function_checked. * Migrate p31, p32 to enqueue_function_checked. * Migrate p33, p34, and the one remaining case in p09 to enqueue_function_checked. * Migrate p21 to compile_function_checked.
Co-authored-by: raju <raju.ptvs@gmail.com>
…A A2000 to test script (modular#154) * add nvidia A2000 (Ada Generation) to compute capability 8.9 group in GPU test script * fix: modular#152 * ran `pixi run format` --------- Co-authored-by: David Meaux <dmeaux@geomatys.com>
Co-authored-by: David Meaux <dmeaux@geomatys.com>
…bers to match code (modular#158) Co-authored-by: David Meaux <dmeaux@geomatys.com>
* Embed YouTube videos for puzzles 1-3 * Increase video player margin * Add breakpoint support
Added special note for WSL users regarding CUDA debugging tools.
* Fix command syntax in third_case.md Corrected the command syntax for running the third case. * Fix command syntax in third_case.md Corrected command syntax for running the third case.
Updated shared memory representation and access patterns for Block 1, including zero padding details and condition evaluations.
ehsanmok
requested changes
Jan 5, 2026
Collaborator
ehsanmok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed, please rebase. There's a lot of upstream commits here.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
declare index as Int() when used in < operations later.