CVE-2026-24188: About NVIDIA TensorRT (26th May 2026)

Preface: TensorRT is NVIDIA’s general-purpose inference SDK that compiles and optimizes a wide variety of AI models (CNNs, computer vision, traditional neural networks) to run as fast as possible on NVIDIA GPUs.

TensorRT-LLM is a specialized, open-source library built on top of TensorRT specifically tailored to optimize and execute Large Language Models (LLMs).

Background: How the Diagram Corresponds to the Vulnerability?

The diagram maps out how improper memory management between the host (CPU) and device (GPU) exposes a system to this flaw:

  1. Static Buffer Allocation: Step #3 allocates a rigid GPU memory space using cuda.mem_alloc(input_data.nbytes). This sets up a buffer size based entirely on the initial shape of the input_data.
  2. Untrusted Runtime Input: As shown in text boxes 3 and 4, if a remote attacker sends a maliciously crafted input that modifies the shape or size at runtime, the application fails to recalculate the allocation bounds.
  3. Out-of-Bounds Copy: When Step #4 (cuda.memcpy_htod) executes, it forces the larger data stream into the pre-allocated smaller buffer. This overflows the boundary and writes data directly into adjacent GPU memory locations, causing a classic CWE-787 Out-of-bounds Write.

Remediations

  • Update the Software: NVIDIA released an advisory specifying that upgrading to TensorRT v10.16.1 or newer mitigates these risks.
  • Input Boundary Checks: Always strictly validate input dimensions before initiating data copies to device memory.
  • Leverage Native Profiles: If deploying models with varying input dimensions, use TensorRT’s built-in optimization profiles for dynamic shapes rather than manually overriding raw host-to-device pointers without size verification.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5836

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.