Implement unwrapSelection method with cluster-based parallel optimization #295

Copilot · 2025-12-13T21:33:11Z

Adds unwrapSelection to _EditKernelsService to transform selected cell positions from periodic/modulo space to flat Euclidean space. This will be used for calculating mass center and relative positions.

Changes

EditKernelsService.cuh/cu:

New public method unwrapSelection(CudaSettings const&, SimulationData const&, float2 const& refPos) with GPU memory for iteration tracking

EditKernels.cuh/cu - CUDA kernels:

cudaInitUnwrapSelection: Initializes all selected cells for cluster-based unwrapping (clusterIndex, tempValue, shared1, shared2)
cudaFindUnwrapClusters: Finds connected components among selected cells using atomicMin propagation (same pattern as ClusterProcessor)
cudaFindNearestInCluster: Finds the nearest cell to refPos per cluster using atomicMin with distance+index encoding
cudaMarkUnwrapStartCells: Marks starting cells for all clusters in parallel
cudaUnwrapSelectionStep: BFS-style propagation through connected cells using atomicCAS for thread safety

Algorithm

Initialize all selected cells with clusterIndex set to self
Find connected components by propagating minimum clusterIndex through connected selected cells
For each cluster, find the nearest cell to refPos using atomicMin (all clusters processed in parallel)
Mark all starting cells with corrected positions relative to refPos
Propagate unwrapping through all connected cells in all components simultaneously

Performance

The cluster-based approach processes all connected components in parallel, reducing kernel calls from O(components × iterations) to O(iterations). This is critical when many disconnected components exist in the selection.

void unwrapSelection(CudaSettings const& gpuSettings, SimulationData const& data, float2 const& refPos);
// After call: selected cells have unwrapped positions in Cell::shared1
// Cell::tempValue.as_uint64 == 1 indicates cell was processed

Original prompt

Implement a new method
void unwrapSelection(CudaSettings const& gpuSettings, SimulationData const& data, float2 const& refPos);
in _EditKernelsService which unrolls the selection from the periodic/modulo space (the particles are simulated in 2d space with periodic boundary conditions)
to a flat Euclidean space. This method will later be used for calculating the mass center and relative positions to it.
The unwrapping should be performed on the connected component of the selection (the cells where the member Cell::selected equals 1 or 2).
For unwrapSelection there are maybe several kernels necessary, which should be implemented in EditKernels.cuh/cu.
The refPosition (=referencePosition) should be used to identify the initial cells for unwrapping in the connected components.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Add a new method that unrolls cell positions from periodic/modulo space to flat Euclidean space. This is done by: 1. Marking cells near refPos as starting points for unwrapping 2. Propagating unwrapping through connected cells using BFS-like algorithm 3. Using atomicCAS to prevent race conditions during parallel execution The unwrapped positions are stored in Cell::shared1 field. Co-authored-by: chrxh <73127001+chrxh@users.noreply.github.com>

chrxh · 2025-12-13T22:18:26Z

source/EngineGpuKernels/EditKernels.cu

+            cell->shared1 = cell->pos;
+
+            // Mark cells within radius of refPos as starting points (already unwrapped)
+            if (data.cellMap.getDistance(refPos, cell->pos) <= radius) {


@copilot 3 problems in this method: (1) There could be no cells within the radius of refPosition.
(2) There could be 2 cells of the same connected cell component within radius which should be actually far distant in unwrapped space.
(3) There could be many other connected cell components which are selected and would not be captured by this method.

Fixed in commit 9e18708. The revised implementation:

No cells within radius: Now finds the closest cell to refPos regardless of distance using cudaFindClosestUnwrappedCell with atomicMin.

Multiple cells in same component within radius: Only one cell is marked as starting point per component - the closest unprocessed cell.

Multiple disconnected components: The outer loop continues until all selected cells are processed. Each component gets its own starting cell (the closest unprocessed cell to refPos at that iteration).

New kernels:

cudaInitUnwrapSelection: Initialize all selected cells

cudaFindClosestUnwrappedCell: Find closest unprocessed cell

cudaMarkUnwrapStartCell: Mark single cell as start

cudaUnwrapSelectionStep: BFS propagation (unchanged)

Addressed review feedback: 1. Fixed: No cells within radius - now finds closest cell regardless 2. Fixed: Multiple cells in same component - only one starting point 3. Fixed: Multiple disconnected components - each gets its own start New algorithm: - cudaInitUnwrapSelection: Initialize all selected cells - cudaFindClosestUnwrappedCell: Find closest unprocessed cell to refPos - cudaMarkUnwrapStartCell: Mark single cell as starting point - cudaUnwrapSelectionStep: BFS propagation through component - Repeat for each disconnected component Co-authored-by: chrxh <73127001+chrxh@users.noreply.github.com>

chrxh · 2025-12-14T07:23:47Z

@copilot Your approach is problematic for the performamce if there are many connected components. Instead determine all connected components first (see logic for cluster members in Cell). Then determin for each component its nearest to refPosition (can be done in one or two kernel calls for all components together). Here one can tag the cluster for the nearest cell.

Refactored algorithm to process all connected components in parallel: 1. Initialize clusterIndex for each selected cell 2. Find clusters using atomicMin propagation (ClusterProcessor pattern) 3. Find nearest cell to refPos per cluster using atomicMin 4. Mark all starting cells in parallel 5. Propagate unwrapping through all components simultaneously This improves performance from O(components * iterations) to O(iterations) kernel calls when many disconnected components exist. Co-authored-by: chrxh <73127001+chrxh@users.noreply.github.com>

Copilot · 2025-12-14T07:42:23Z

@copilot Your approach is problematic for the performamce if there are many connected components. Instead determine all connected components first (see logic for cluster members in Cell). Then deter...

Fixed in commit 8cebaf4. The new implementation uses the cluster-based parallel approach:

Find clusters (cudaFindUnwrapClusters): Uses the same atomicMin pattern as ClusterProcessor to identify connected components among selected cells
Find nearest per cluster (cudaFindNearestInCluster): Each cell computes its distance to refPos and atomically updates its cluster head's shared2 if closer (using distance+index encoding with atomicMin)
Mark starting cells (cudaMarkUnwrapStartCells): Each cluster head marks its nearest cell as unwrapped

This processes all components in parallel, reducing kernel calls from O(components × iterations) to O(iterations).

Initial plan

53d616c

Copilot AI assigned Copilot and chrxh Dec 13, 2025

Copilot started work on behalf of chrxh December 13, 2025 21:33 View session

Copilot AI changed the title ~~[WIP] Add unwrapSelection method to _EditKernelsService~~ Implement unwrapSelection method for periodic boundary unwrapping Dec 13, 2025

Copilot AI requested a review from chrxh December 13, 2025 21:51

Copilot finished work on behalf of chrxh December 13, 2025 21:51

chrxh reviewed Dec 13, 2025

View reviewed changes

Copilot started work on behalf of chrxh December 13, 2025 22:19 View session

Copilot AI requested a review from chrxh December 13, 2025 22:36

Copilot finished work on behalf of chrxh December 13, 2025 22:36

Copilot started work on behalf of chrxh December 14, 2025 07:24 View session

Copilot AI changed the title ~~Implement unwrapSelection method for periodic boundary unwrapping~~ Implement unwrapSelection method with cluster-based parallel optimization Dec 14, 2025

Copilot finished work on behalf of chrxh December 14, 2025 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement unwrapSelection method with cluster-based parallel optimization #295

Implement unwrapSelection method with cluster-based parallel optimization #295

Uh oh!

Copilot AI commented Dec 13, 2025 •

edited

Loading

Uh oh!

chrxh Dec 13, 2025

Uh oh!

Copilot AI Dec 13, 2025

Uh oh!

chrxh commented Dec 14, 2025

Uh oh!

Copilot AI commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement unwrapSelection method with cluster-based parallel optimization #295

Are you sure you want to change the base?

Implement unwrapSelection method with cluster-based parallel optimization #295

Uh oh!

Conversation

Copilot AI commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Algorithm

Performance

Uh oh!

chrxh Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

chrxh commented Dec 14, 2025

Uh oh!

Copilot AI commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 13, 2025 •

edited

Loading