Skip to content

Releases: NVIDIA/TensorRT-RTX

TensorRT-RTX 1.3 Release

19 Dec 21:07

Choose a tag to compare

This OSS release supports TensorRT-RTX 1.3, and contains sample code to showcase its capabilities and recommended usage.

Details are available in the release notes. Notable features in this release include:

  • Enabled thread-safe execution for multiple GPUs with different compute capabilities, up to one network per thread.
  • Performance has been improved for LLMs and convolution-based models.
  • Supports CUDA contexts created in NVIDIA CUDA graphics mode on NVIDIA Blackwell devices.
  • Performance has been improved for many FP8 models on Blackwell.
  • Performance has been improved for many 2D convolutions.

TensorRT-RTX 1.2 Release

12 Nov 21:36

Choose a tag to compare

This OSS release supports TensorRT-RTX 1.2, and contains sample code to showcase its capabilities and recommended usage.

Details are available in the release notes. Notable features in this release include:

  • Built-in CUDA Graphs with automatic dynamic shape support, which can be enabled with a one-line change to existing workflows, gives users the potential to further accelerate their inference workflows by reducing the GPU kernel launch overhead at runtime.
  • Users can set a new kREQUIRE_USER_ALLOCATION builder flag to require that engines use application-provided memory where possible (in contrast to runtime-allocated memory). This is required when using CUDA stream capture. However, stream capture is not possible for all models, especially when using data-dependent dynamic shapes or certain on-device control flows. A new IExecutionContext::isStreamCapturable() API allows querying whether stream capture is possible in the current execution context or not.
  • The DLL libraries have moved from the lib subdirectory to the bin subdirectory.

This release includes support for the CUDA 13.0 Toolkit, available on both Linux and Windows. Users should download the CUDA 12.9 and CUDA 13.0 TensorRT-RTX build separately as needed.

TensorRT-RTX 1.1 Release

15 Aug 04:20
6a70878

Choose a tag to compare

This OSS release supports TensorRT-RTX 1.1, and contains sample code to showcase its capabilities and recommended usage.

Details are available in the release notes. Notable features in this release include:

  • Added the IRuntime::getEngineValidity() API to programmatically check whether a TensorRT-RTX engine file is valid on the current system or needs to be rebuilt due to incompatibilities in the software version, compute capability, and so on. This API only checks the file header and therefore does not require loading the entire engine file into memory, which results in a more efficient check.
  • Compilation time has been greatly improved, particularly for models with many memory-bound kernels. On average a 1.5x improvement is observed across a variety of model architectures.

TensorRT-RTX 1.0 Release

11 Jun 22:39

Choose a tag to compare

This OSS release supports TensorRT-RTX 1.0, and contains sample code to showcase its capabilities and recommended usage.

Details regarding TensorRT-RTX 1.0 are available in the release notes. Notable features include:

  • Reduced binary size for improved download speed and disk footprint when included in consumer applications.
  • Splitting optimization into a hardware-agnostic “ahead-of-time” (AOT) phase and a hardware-specific “just-in-time” (JIT) phase in order to improve user experience.
  • Improved adaptivity to real-system resources for applications where AI features run in the background.
  • Focused improvement on portability and deployment while still delivering industry-leading performance.
  • Added native acceleration support for Windows ML.