-
Notifications
You must be signed in to change notification settings - Fork 181
Implement unwrapSelection method with cluster-based parallel optimization #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Add a new method that unrolls cell positions from periodic/modulo space to flat Euclidean space. This is done by: 1. Marking cells near refPos as starting points for unwrapping 2. Propagating unwrapping through connected cells using BFS-like algorithm 3. Using atomicCAS to prevent race conditions during parallel execution The unwrapped positions are stored in Cell::shared1 field. Co-authored-by: chrxh <73127001+chrxh@users.noreply.github.com>
| cell->shared1 = cell->pos; | ||
|
|
||
| // Mark cells within radius of refPos as starting points (already unwrapped) | ||
| if (data.cellMap.getDistance(refPos, cell->pos) <= radius) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot 3 problems in this method: (1) There could be no cells within the radius of refPosition.
(2) There could be 2 cells of the same connected cell component within radius which should be actually far distant in unwrapped space.
(3) There could be many other connected cell components which are selected and would not be captured by this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 9e18708. The revised implementation:
-
No cells within radius: Now finds the closest cell to refPos regardless of distance using
cudaFindClosestUnwrappedCellwith atomicMin. -
Multiple cells in same component within radius: Only one cell is marked as starting point per component - the closest unprocessed cell.
-
Multiple disconnected components: The outer loop continues until all selected cells are processed. Each component gets its own starting cell (the closest unprocessed cell to refPos at that iteration).
New kernels:
cudaInitUnwrapSelection: Initialize all selected cellscudaFindClosestUnwrappedCell: Find closest unprocessed cellcudaMarkUnwrapStartCell: Mark single cell as startcudaUnwrapSelectionStep: BFS propagation (unchanged)
Addressed review feedback: 1. Fixed: No cells within radius - now finds closest cell regardless 2. Fixed: Multiple cells in same component - only one starting point 3. Fixed: Multiple disconnected components - each gets its own start New algorithm: - cudaInitUnwrapSelection: Initialize all selected cells - cudaFindClosestUnwrappedCell: Find closest unprocessed cell to refPos - cudaMarkUnwrapStartCell: Mark single cell as starting point - cudaUnwrapSelectionStep: BFS propagation through component - Repeat for each disconnected component Co-authored-by: chrxh <73127001+chrxh@users.noreply.github.com>
|
@copilot Your approach is problematic for the performamce if there are many connected components. Instead determine all connected components first (see logic for cluster members in |
Refactored algorithm to process all connected components in parallel: 1. Initialize clusterIndex for each selected cell 2. Find clusters using atomicMin propagation (ClusterProcessor pattern) 3. Find nearest cell to refPos per cluster using atomicMin 4. Mark all starting cells in parallel 5. Propagate unwrapping through all components simultaneously This improves performance from O(components * iterations) to O(iterations) kernel calls when many disconnected components exist. Co-authored-by: chrxh <73127001+chrxh@users.noreply.github.com>
Fixed in commit 8cebaf4. The new implementation uses the cluster-based parallel approach:
This processes all components in parallel, reducing kernel calls from O(components × iterations) to O(iterations). |
Adds
unwrapSelectionto_EditKernelsServiceto transform selected cell positions from periodic/modulo space to flat Euclidean space. This will be used for calculating mass center and relative positions.Changes
EditKernelsService.cuh/cu:
unwrapSelection(CudaSettings const&, SimulationData const&, float2 const& refPos)with GPU memory for iteration trackingEditKernels.cuh/cu - CUDA kernels:
cudaInitUnwrapSelection: Initializes all selected cells for cluster-based unwrapping (clusterIndex, tempValue, shared1, shared2)cudaFindUnwrapClusters: Finds connected components among selected cells using atomicMin propagation (same pattern as ClusterProcessor)cudaFindNearestInCluster: Finds the nearest cell to refPos per cluster using atomicMin with distance+index encodingcudaMarkUnwrapStartCells: Marks starting cells for all clusters in parallelcudaUnwrapSelectionStep: BFS-style propagation through connected cells using atomicCAS for thread safetyAlgorithm
clusterIndexset to selfPerformance
The cluster-based approach processes all connected components in parallel, reducing kernel calls from O(components × iterations) to O(iterations). This is critical when many disconnected components exist in the selection.
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.