invalid configuration argument when running with 1 GPU

### Describe the issue:

When compiling Grid and running with a single GPU, running e.g. `Benchmark_ITT` gives the error:

    accelerator_barrier(): Cuda error invalid configuration argument

Digging into this, this is due to line 137 of `Grid/threads/Accelerator.h`

    dim3 cu_blocks ((num1+nt-1)/nt,num2,1);                   \

For reasons I haven't dug deep enough to understand, when running with 1 GPU, then `(num1+nt-1)/nt` (or in the specific case that fails—called from `WilsonKernelsImplementation.h`—`(sz+nt-1)/nt`) gets set to zero, which isn't a valid block count.

As a workaround, changing line 137 to

    dim3 cu_blocks ((num1+nt-1)/nt == 0 ? 1 : (num1+nt-1)/nt,num2,1);                   \

allows the code to run correctly.

### Code example:

```shell
N/A
```


### Target platform:

Tested on Grace Hopper Arm+H100, Leicester Arm+A100, and AMD Rome + A100 in Swansea.

### Configure options:

```shell
../configure --enable-comms=none --enable-simd=GPU --enable-accelerator=cuda CXX=nvcc --disable-zmobius --disable-gparity 'CXXFLAGS=-g -gencode arch=compute_90,code=sm_90 -std=c++17 -DEIGEN_DONT_VECTORIZE'
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

invalid configuration argument when running with 1 GPU #452

Describe the issue:

Code example:

Target platform:

Configure options:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

invalid configuration argument when running with 1 GPU #452

Description

Describe the issue:

Code example:

Target platform:

Configure options:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions