
Original release date: February 11, 2025
Preface: The NVIDIA Triton Inference Server API supports both HTTP/REST and GRPC protocols. These protocols allow clients to communicate with the Triton server for various tasks such as model inferencing, checking server and model health, and managing model metadata and statistics.
Background: NVIDIA Triton™ Inference Server, part of the NVIDIA AI platform and available with NVIDIA AI Enterprise, is open-source software that standardizes AI model deployment and execution across every workload.
The Asynchronous Server Gateway Interface (ASGI) is a calling convention for web servers to forward requests to asynchronous-capable Python frameworks, and applications. It is built as a successor to the Web Server Gateway Interface (WSGI).
Vulnerability details: NVIDIA Triton Inference Server contains a vulnerability in the model loading API, where a user could cause an integer overflow or wraparound error by loading a model with an extra-large file size that overflows an internal variable. A successful exploit of this vulnerability might lead to denial of service.
Official announcement: Please refer to the vendor announcement for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5612