[BUG]: cuda.compute.mergesort boundary values cause values to be overwritten

### Is this a duplicate?

- [x] I confirmed there appear to be no [duplicate issues](https://github.com/NVIDIA/accelerated-computing-hub/issues) for this bug and that I agree to the [Code of Conduct](CODE_OF_CONDUCT.md)

### Describe the bug

I found this while going through the mergesort exercise for 23__cuda_cccl__customizing_algorithms.ipynb

Basically, when you pass a user defined comparison function to mergesort, values get overwritten with the inputs provided for the exercise. 

I have made the example more complicated (using keys and values) in order to isolate the issue better, but I originally observed this behavior with the instructor's solution 

```
def comparison_op(lhs, rhs):
    return lhs % 10 <= rhs % 10
```

The problem is using `<=` operator. If you replace `a <= b` with logically-equivalent `b > a`, your result set will be the same as the inputs. 

## GPU Info
CUDA driver version: (12, 7)
Number of CUDA devices: 1
Device name: NVIDIA L40S
Device UUID: 821ed086-a1a8-c1dc-060c-b2f35fa2c0f6
PCI Bus ID: 0002:00:01.0

## Complete jupyter notebook cell

```
import numpy as np
import cupy  as cp
import cuda.compute as comp

from cuda.core.experimental import system, Device

# Check CUDA driver version
print(f"CUDA driver version: {system.driver_version}")

# Get number of available devices
print(f"Number of CUDA devices: {system.num_devices}")

# Get device information
if system.num_devices > 0:
    device = Device(0)  # Get the first GPU
    device.set_current()  # Tell CUDA we want to use this GPU
    print(f"Device name: {device.name}")
    print(f"Device UUID: {device.uuid}")
    print(f"PCI Bus ID: {device.pci_bus_id}")
else:
    print("No CUDA devices found!")

# Prepare the input and output arrays.
# original: with comparison_op_bugged, this will overwrite 29 with 9 in the output
d_in_vals = cp.asarray([29, 9, 136, 1001, 72, 24, 32, 1], dtype="int32")

# certain permutations are even more broken: if you use this instead with comparison_op_bugged, (9,29) will get overwritten with (136,136)
# d_in_vals = cp.asarray([136, 72, 9, 1, 29, 1001, 24, 32], dtype="int32")

d_in_keys = cp.asarray(list(map(lambda x: x % 10, d_in_vals.get())))

# define the custom comparator.
def comparison_op_works(lhs, rhs):
    return rhs > lhs
def comparison_op_works2(lhs, rhs):
    # still works (although not logically equivalent at the boundary)
    return lhs < rhs

def comparison_op_bugged(lhs, rhs):
    # logically equivalent to comparison_op_works -- but equality breaks things
    return lhs <= rhs


# comparison_op = comparison_op_works
comparison_op = comparison_op_bugged


print("Inputs")
print(d_in_keys)
print(d_in_vals)
# Perform the merge sort.
comp.merge_sort(
    d_in_keys,
    d_in_vals,
    d_in_keys,
    d_in_vals,
    comparison_op,
    d_in_keys.size
)

print(f"Result:")
print(d_in_keys)
print(d_in_vals)

expected = np.asarray([1001, 1, 72, 32, 24, 136, 29, 9], dtype=np.int32)
assert (d_in_vals.get() == expected).all()
```

### Steps to reproduce

Please see the description. Run the cell. 

### Expected behavior

I would expect `a > b` to return the same result as `b <= a` in the comparison function.

I would also expect the output of mergesort to be a strict permutation of the inputs.

### Actual behavior

comparison_op_bugged passed to op argument of mergesort will replace values in the array by duplicating other elements. Note that it didn't matter if you did the sort inplace or not.

### Environment

See description

### Additional context

Talked to the instructor (Katrina Riehl) about this at Pydata Boston and she asked me to file an issue here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: cuda.compute.mergesort boundary values cause values to be overwritten #111

Is this a duplicate?

Describe the bug

GPU Info

Complete jupyter notebook cell

Steps to reproduce

Expected behavior

Actual behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: cuda.compute.mergesort boundary values cause values to be overwritten #111

Description

Is this a duplicate?

Describe the bug

GPU Info

Complete jupyter notebook cell

Steps to reproduce

Expected behavior

Actual behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions