Skip to content

Static memory leak in ICudaEngine::createExecutionContext #14

@mariusv-github

Description

@mariusv-github

Describe the bug
Creating an IExecutionContext with ICudaEngine::createExecutionContext will induce a GPU memory leak from 0x400000 (4 MBytes for small NV example on the page "Using the TensorRT-RTX Runtime API") up to 0xE00000 (14 MBytes for a big Yolo11-X/ONNX model), function of network complexity (but independent from network precision float/half). Destroying the newly creating IExecutionContext will not free this block of GPU memory, and any other new creation of an IExecutionContext will not increase the memory leak.
This memory block can be release only by NV driver when calling cudaDeviceReset.

It looks like a "static" GPU memory allocation (pointer kept in a static variable or static attribute) in ICudaEngine::createExecutionContext when creating an IExecutionContext object in the depths of "tensorrt_rtx_1_1.dll" (not "tensorrt_onnxparser_rtx_1_1.dll", nor "cudart.dll") and non-released by IExecutionContext::~IExecutionContext destructor nor by ICudaEngine::~ICudaEngine destructor.

And the call to method IExecutionContext::enqueueV3 or IExecutionContext::executeV2 will eventually induce another static memory leak up to 0x600000 bytes (6 MBytes for Yolo11-X model, both methods produce the same memory leak) so the max total leak can be equal to 0x1400000 bytes (20 MBytes). Here there is no new object created (from the user point of view) so nothing to destroy... Same behavior as ICudaEngine::createExecutionContext (fixed size for a given network, dependent of network complexity but independent of data precision and released only by the final cudaDeviceReset).

Steps to reproduce
Take any TensorRT-RTX example (or personal code) creating an IExecutionContext and modify the main function:

size_t ShowGPUmem()
{
  size_t memFree, memTotal;
  if(cudaMemGetInfo(&memFree, &memTotal)==::cudaSuccess) printf("Free: %016I64X, Total: %016I64X\n", memFree, memTotal);
  return memFree;
}

int main()
{
  size_t init=ShowGPUmem();
  for(int i=0; i<5; i++) // the size of memory leak is constant and independent of number of loops
  try
  {
    // call any test creating an IExecutionContext with ICudaEngine::createExecutionContext: after-before == 0x400000 ... 0xE00000
    // call any test calling ICudaEngine::createExecutionContext and IExecutionContext::enqueueV3: after-before == 0x400000 ... 0x1400000
    // call any test neutralized before ICudaEngine::createExecutionContext : after-before will be == 0
  }
  catch(MyExcept_& e) { std::cerr<<e; }
  size_t before=ShowGPUmem();
  cudaDeviceReset();
  size_t after=ShowGPUmem();
  printf("cudaDeviceReset: %s, Memory leaks of ICudaEngine::createExecutionContext = %016I64X bytes\n", (after==init?"OK":"ERROR"), after-before);
  return 0;
}

Expected behavior
The IExecutionContext::~IExecutionContext destructor must release any GPU memory allocated during the creation of the object. And the user can have a way to release the extra GPU memory allocated when calling executeV2/enqueueV3 methods.

Environment (same results)

  • TensorRT-RTX version: 1.1.1.26
  • GPU: RTX 5090 Laptop / RTX 3080
  • Operating system: W10 LTSC, W10 22H2, W11 LTSC 24H2, drivers NV 576.52, VS 15/16/17
  • CUDA version: 12.9
  • CPU architecture: Intel(R) Core(TM) Ultra 9 275HX

Screenshots

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions