Static memory leak in ICudaEngine::createExecutionContext

**Describe the bug**
Creating an **IExecutionContext** with **ICudaEngine::createExecutionContext** will induce a GPU memory leak from **0x400000** (4 MBytes for small NV example on the page "Using the TensorRT-RTX Runtime API") up to **0xE00000** (14 MBytes for a big Yolo11-X/ONNX model), function of network complexity (but independent from network precision float/half). Destroying the newly creating **IExecutionContext** will not free this block of GPU memory, and any other new creation of an **IExecutionContext** will not increase the memory leak.
This memory block can be release only by NV driver when calling **cudaDeviceReset**.

It looks like a "static" GPU memory allocation (pointer kept in a static variable or static attribute) in **ICudaEngine::createExecutionContext** when creating an **IExecutionContext** object in the depths of "tensorrt_rtx_1_1.dll" (not "tensorrt_onnxparser_rtx_1_1.dll", nor "cudart.dll") and non-released by **IExecutionContext::~IExecutionContext** destructor nor by **ICudaEngine::~ICudaEngine** destructor.

And the call to method **IExecutionContext::enqueueV3** or **IExecutionContext::executeV2** will eventually induce another static memory leak up to **0x600000** bytes (6 MBytes for Yolo11-X model, both methods produce the same memory leak) so the max total leak can be equal to **0x1400000** bytes (20 MBytes). Here there is no new object created (from the user point of view) so nothing to destroy... Same behavior as **ICudaEngine::createExecutionContext** (fixed size for a given network, dependent of network complexity but independent of data precision and released only by the final **cudaDeviceReset**).

**Steps to reproduce**
Take any TensorRT-RTX example (or personal code) creating an **IExecutionContext** and modify the **main** function:

````
size_t ShowGPUmem()
{
  size_t memFree, memTotal;
  if(cudaMemGetInfo(&memFree, &memTotal)==::cudaSuccess) printf("Free: %016I64X, Total: %016I64X\n", memFree, memTotal);
  return memFree;
}

int main()
{
  size_t init=ShowGPUmem();
  for(int i=0; i<5; i++) // the size of memory leak is constant and independent of number of loops
  try
  {
    // call any test creating an IExecutionContext with ICudaEngine::createExecutionContext: after-before == 0x400000 ... 0xE00000
    // call any test calling ICudaEngine::createExecutionContext and IExecutionContext::enqueueV3: after-before == 0x400000 ... 0x1400000
    // call any test neutralized before ICudaEngine::createExecutionContext : after-before will be == 0
  }
  catch(MyExcept_& e) { std::cerr<<e; }
  size_t before=ShowGPUmem();
  cudaDeviceReset();
  size_t after=ShowGPUmem();
  printf("cudaDeviceReset: %s, Memory leaks of ICudaEngine::createExecutionContext = %016I64X bytes\n", (after==init?"OK":"ERROR"), after-before);
  return 0;
}
````
**Expected behavior**
The **IExecutionContext::~IExecutionContext** destructor must release any GPU memory allocated during the creation of the object. And the user can have a way to release the extra GPU memory allocated when calling **executeV2/enqueueV3** methods.

**Environment** (same results)
 - TensorRT-RTX version: 1.1.1.26
 - GPU: RTX 5090 Laptop / RTX 3080
 - Operating system: W10 LTSC, W10 22H2, W11 LTSC 24H2, drivers NV 576.52, VS 15/16/17
 - CUDA version: 12.9
 - CPU architecture: Intel(R) Core(TM) Ultra 9 275HX

**Screenshots**

**Additional context**



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static memory leak in ICudaEngine::createExecutionContext #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Static memory leak in ICudaEngine::createExecutionContext #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions