Category Archives: AI and ML

CVE-2025-33201 and CVE-2025-33211: About NVIDIA Triton Inference Server 10th Dec 2025

Initial release – 02 December 2025   

Preface: Triton Inference Server is an open source inference serving software that streamlines AI inferencing. Triton Inference Server enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more.

Background: The NVIDIA Triton Inference Server natively supports gRPC as one of its primary communication protocols for the client API. Furthermore, gRPC can also be used for health checks, statistics, and model loading/unloading operations, not just inference requests. Inference requests arrive at the server via either HTTP/REST or GRPC or by the C API and are then routed to the appropriate per-model scheduler. Triton implements multiple scheduling and batching algorithms that can be configured on a model-by-model basis.

Triton major features:

  • Supports multiple deep learning frameworks
  • Supports multiple machine learning frameworks
  • Concurrent model execution
  • Dynamic batching
  • Sequence batching and implicit state management for stateful models
  • Provides Backend API that allows adding custom backends and pre/post processing operations
  • Model pipelines using Ensembling or Business Logic Scripting (BLS)
  • HTTP/REST and GRPC inference protocols based on the community developed KServe protocol
  • A C API and Java API allow Triton to link directly into your application for edge and other in-process use cases
  • Metrics indicating GPU utilization, server throughput, server latency, and more

Vulnerability details:

CVE-2025-33211         NVIDIA Triton Server for Linux contains a vulnerability where an attacker may cause an improper validation of specified quantity in input. A successful exploit of this vulnerability may lead to denial of service.

CVE-2025-33201         NVIDIA Triton Inference Server contains a vulnerability where an attacker may cause an improper check for unusual or exceptional conditions issue by sending extra large payloads. A successful exploit of this vulnerability may lead to denial of service.

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5734

CVE-2025-33208: NVIDIA TAO design weakness (9th Dec 2025)

Official Updated 11/26/2025

Preface: AI vision models are artificial intelligence systems, often multimodal (Vision-Language Models or VLMs), that analyze and interpret visual data (images, videos) alongside text, enabling them to “see” and understand the world like humans, performing tasks from object recognition and image captioning to answering questions about visuals and generating new images, by converting visual info into a format comparable to text.

Background: You use NVIDIA TAO (Train, Adapt, Optimize) to rapidly build, customize, and deploy high-performance, domain-specific AI models (especially for vision) with less code, less data, and faster training by leveraging powerful pre-trained foundation models, fine-tuning them with your own data, and optimizing them for efficient inference on edge-to-cloud devices, saving significant time and resources.

The NVIDIA TAO Toolkit is designed to function with both real and synthetic data.

Training with Real Data: The primary function of the TAO Toolkit is to fine-tune NVIDIA’s extensive library of pretrained foundation models using your own proprietary (real-world) datasets. This process is low-code and enables the customization of models for specific use cases without needing deep AI expertise or training from scratch.

Leveraging Synthetic Data: Synthetic data is often used to address the challenges associated with real data collection, such as scarcity, expensive labeling, and rare edge cases.

Models can be initially trained on large volumes of synthetic data generated from tools like NVIDIA Omniverse Replicator or partner platforms (e.g., Sky Engine AI, AI. Reverie, Lexset).

Vulnerability details: (CVE-2025-33208) NVIDIA TAO contains a vulnerability where an attacker may cause a resource to be loaded via an uncontrolled search path. A successful exploit of this vulnerability may lead to escalation of privileges, data tampering, denial of service, information disclosure

Official announcement: Please refer to the link for more details.

https://nvidia.custhelp.com/app/answers/detail/a_id/5730

CVE-2025-33204: About NVIDIA NeMo Framework (1st Dec 2025)

Official Update 11/21/2025 04:36 PM

Preface: NeMo Curator is a Python library that includes a suite of modules for data-mining and synthetic data generation. They are scalable and optimized for GPUs, making them ideal for curating natural language data to train or fine-tune LLMs. With NeMo Curator, researchers in Natural Language Processing (NLP) can efficiently extract high-quality text from extensive raw web data sources.

NVIDIA NeMo Curator, particularly its image curation modules, requires a CUDA-enabled NVIDIA GPU and the corresponding CUDA Toolkit. The CUDA Toolkit is not installed as part of the NeMo Curator installation process itself, but rather is a prerequisite for utilizing GPU-accelerated features.

Background: NeMo Framework includes NeMo Curator because high-quality data is essential for training accurate generative AI models, and Curator provides a scalable, GPU-accelerated toolset for processing and preparing large datasets efficiently. It handles everything from cleaning and deduplicating text to generating synthetic data for model customization and evaluation, preventing data processing from becoming a bottleneck.

Potential risks under observation: The vulnerability arises when malicious files—such as JSONL files—are loaded by NeMo Curator. If these files are crafted to exploit weaknesses in how NeMo Curator parses or processes them, they can inject executable code.

Ref: Parser is related to predefined variables, as it can either parse data into variables or use predefined variables to perform its task.

Vulnerability details:

CVE-2025-33204         NVIDIA NeMo Framework for all platforms contains a vulnerability in the NLP and LLM components, where malicious data created by an attacker could cause code injection. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.

CVE-2025-33205         NVIDIA NeMo framework contains a vulnerability in a predefined variable, where an attacker could cause inclusion of functionality from an untrusted control sphere by use of a predefined variable. A successful exploit of this vulnerability may lead to code execution.

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5729

CVE-2025-33203 – Design weakness of NVIDIA NeMo Agent Toolkit UI for Web. Another preventive approach. (28th Nov 2025)

Preface: While web vulnerabilities can lead to various cyberattacks, they don’t directly or exclusively cause ransomware attacks. CSRF attacks exploit the trust a website has in a user’s browser to perform unauthorized actions on that website, while ransomware involves malware that encrypts a user’s system and demands payment.

Background: The official frontend user interface component for NeMo Agent Toolkit, an open-source library for building AI agents and workflows.

Prerequisites

  • NeMo Agent Toolkit installed and configured
  • Git
  • Node.js (v18 or higher)
  • npm or Docker

While Node.js v18 itself doesn’t inherently prevent or cause CSRF, it’s crucial to implement proper CSRF protection in your Node.js applications built with this version. Node.js v18 is now End-of-Life (EOL), meaning it no longer receives security updates, which makes implementing robust security measures even more critical.

Vulnerability details: CVE-2025-33203 – NVIDIA NeMo Agent Toolkit UI for Web contains a vulnerability in the chat API endpoint where an attacker may cause a Server-Side Request Forgery. A successful exploit of this vulnerability may lead to information disclosure and denial of service.

Affected Products:      NeMo Agent ToolKit  

Platforms or OS: All platforms

Affected Product: NeMo Agent ToolKit

Affected Versions: All versions prior to 1.3.0

Updated Version: 1.3.0

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5726

AI developers, please do not underestimate the CVE-2025-33187 (NVIDIA DGX Spark GB10) vulnerability (26th Nov 2025)

Updated 11/21/2025 04:36 PM

Preface: NVIDIA DGX Spark will be used by AI developers, researchers, and data scientists who need to prototype and deploy large AI models on their desktop, including those working with agentic AI, LLMs, and robotics.

The NVIDIA DGX Secure Root of Trust (SRoT), more commonly referred to as the Hardware Root of Trust (HRoT), is a foundational security component embedded in the system’s hardware, including the main GPUs and the BlueField Data Processing Units (DPUs).

The term “NVIDIA DGX SROOT” refers to the Secure Root of Trust (SROOT) firmware component within the NVIDIA DGX Spark personal AI supercomputer. It is a security feature designed to ensure the integrity of the system’s secure boot process and certificate management.

Background: The DGX Spark runs on NVIDIA DGX OS, a customized Ubuntu Linux distribution that includes a full-stack NVIDIA AI software ecosystem. The NVIDIA GB10 is a Superchip that integrates separate CPU and GPU dies in a single package, and the operating system is not embedded within the CPU die itself. Instead, the OS is installed on external NVMe storage, and the system uses unified memory accessible by both dies.

The OS and related software stack are stored on external NVMe solid-state drives (SSDs), not on the CPU die. The DGX Spark workstation typically includes up to 4 TB of NVMe storage.

However, Nvidia SROOT is an internal firmware element located in the Nvidia DGX Spark GB10 systems. It is a specific firmware component that runs on the system’s hardware.

Vulnerability details: CVE-2025-33187 – NVIDIA DGX Spark GB10 contains a vulnerability in SROOT, where an attacker could use privileged access to gain access to SoC protected areas. A successful exploit of this vulnerability might lead to code execution, information disclosure, data tampering, denial of service, or escalation of privileges.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2025-33187

Cyber security focus for NVIDIA Isaac-GR00T – About CVE-2025-33183 and CVE-2025-33184 (24th Nov 2025)

Updated 11/19/2025 09:03 AM

Preface: NVIDIA Isaac is an AI robot development platform consisting of NVIDIA-accelerated libraries, application frameworks, and AI models that accelerate the development of AI robots such as autonomous mobile robots (AMRs), arms and manipulators, and humanoids.

NVIDIA Isaac GR00T N1 is the world’s first open foundation model for generalized humanoid robot reasoning and skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.

Background:

Isaac GR00T N1.5 uses vision and text transformers to encode the robot’s image observations and text instructions. The architecture handles a varying number of views per embodiment by concatenating image token embeddings from all frames into a sequence, followed by language token embeddings.

A token is the most fundamental data unit in a text document, essential for enabling AI to understand and process information. In Natural Language Processing (NLP), tokenization refers to breaking down larger texts into smaller, manageable pieces called tokens.

Please refer to the attached diagram:

The System 1 reasoning module is a pre-trained Vision-Language Model (VLM) that runs on NVIDIA GPU. It processes the robot’s visual perception and language instruction to interpret the environment and understand the task goal. Subsequently, a Diffusion Transformer, trained with action flow-matching, serves as the System 2 action module. It cross-attends to the VLM output tokens and employs embodiment specific encoders and decoders to handle variable state and action dimensions for motion generation.

Vulnerability details: See below –

CVE-2025-33183         NVIDIA Isaac-GR00T for all platforms contains a vulnerability in a Python component, where an attacker could cause a code injection issue. A successful exploit of this vulnerability might lead to code execution, escalation of privileges, information disclosure, and data tampering.

CVE-2025-33184         NVIDIA Isaac-GR00T for all platforms contains a vulnerability in a Python component, where an attacker could cause a code injection issue. A successful exploit of this vulnerability might lead to code execution, escalation of privileges, information disclosure, and data tampering.

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5725

CVE-2025-33185: About NVIDIA AIStore  (17th Nov 2025)

Official Updated 10th Nov 2025 05:39 AM

Preface: The core design objective of NVIDIA AIStore (AIS) is to provide a high-performance, linearly scalable, and flexible storage solution specifically optimized for large-scale AI/ML and data analytics workloads. NVIDIA AIStore (AIS) provides secure access via a standalone Authentication Server (AuthN) that uses OAuth 2.0 compliant JSON Web Tokens (JWT) for token-based authentication.

The AuthN server is part of the broader NVIDIA AIStore project, which is publicly available on GitHub. It provides token-based secure access using the JSON Web Tokens (JWT) framework.

Background: The security of a signed JWT relies on a secret key (for HMAC algorithms like HS256) or a public/private key pair (for RSA or ECDSA). This key is used to create a digital signature that ensures the token’s integrity and authenticity—proving it has not been tampered with. If the application’s source code, configuration files, or version control system contains this secret key in plain text, it violates the principle of confidentiality for credentials. An attacker who discovers this hard-coded secret.

Vulnerability details: NVIDIA AIStore contains a vulnerability in AuthN. A successful exploit of this vulnerability might lead to escalation of privileges, information disclosure, and data tampering.

Impacts: Escalation of privileges, information disclosure, data tampering

Remediation: Updated to v3.31

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5724

CVE-2025-33202: About NVIDIA Triton Inference Server (14th Nov 2025)

Official Updated 11/10/2025 05:39 AM

Preface: Clients can communicate with Triton using either an HTTP/REST protocol, a GRPC protocol, or by an in-process C API or its C++ wrapper. Triton supports HTTP/REST and gRPC, both of which involve complex header parsing.

In the context of the Open Inference Protocol (OIP), also known as KServe V2 Protocol. The protocol defines a standardized interface for model inference, which implies that compliant inference servers must be capable of parsing incoming requests and serializing outgoing responses according to the protocol’s defined message formats.

Background: To define a parser that filters the payload for Triton using the KServe V2 (Open Inference Protocol), you need to handle the following:

Key Considerations

1.Protocol Compliance – The parser must understand the OIP message format:

-Inference Request: Includes inputs, outputs, parameters.

-Inference Response: Includes model_name, outputs, parameters.

-Data can be in JSON (for REST) or Protobuf (for gRPC).

2.Filtering Logic – Decide what you want to filter:

-Specific tensor names?

-Certain data types (e.g., FP32, INT64)?

-Large payloads (e.g., skip tensors above a size threshold)?

-Security checks (e.g., reject malformed headers)?

3.Shared Memory Handling – If shared memory is used, the parser should:

-Validate shared_memory_region references.

-Ensure the payload does not redundantly include tensor data when shared memory is specified.

Vulnerability details: NVIDIA Triton Inference Server for Linux and Windows contains a vulnerability where an attacker could cause a stack overflow by sending extra-large payloads. A successful exploit of this vulnerability might lead to denial of service.

Official announcement: Please see the official link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5723

CVE‑2025‑23361 and CVE-2025-33178: NVIDIA Nemo Framework contains vulnerabilities (13th Nov 2025)

Preface: The advantages of using Hydra and OmegaConf for configuration management are flexibility, reproducibility, and scalability. Hydra’s ability to override configurations at runtime from the command line and compose them from multiple sources makes it highly flexible.

NeMo uses Hydra/OmegaConf for configuration management, which supports interpolation and sometimes dynamic evaluation.

Background: NVIDIA NeMo is an end-to-end platform designed for developing and deploying generative AI models. This includes large language models (LLMs), vision language models (VLMs), video models, and speech AI. NeMo offers tools for data curation, fine-tuning, retrieval-augmented generation (RAG), and inference, making it a comprehensive solution for creating enterprise-ready AI models. Here are some key capabilities of NeMo LLMs:

  1. Customization: NeMo allows you to fine-tune pre-trained models to suit specific enterprise needs. This includes adding domain-specific knowledge and skills, and continuously improving the model with reinforcement learning from human feedback (RLHF).
  2. Scalability: NeMo supports large-scale training and deployment across various environments, including cloud, data centers, and edge devices. This ensures high performance and flexibility for different use cases.
  3. Foundation Models: NeMo offers a range of pre-trained foundation models, such as GPT-8, GPT-43, and GPT-530, which can be used for tasks like text classification, summarization, creative writing, and chatbots.
  4. Data Curation: The platform includes tools for processing and curating large datasets, which helps improve the accuracy and relevance of the models.
  5. Integration: NeMo can be integrated with other NVIDIA AI tools and services, providing a comprehensive ecosystem for AI development.

Vulnerability details:

CVE-2025-23361: NVIDIA NeMo Framework for all platforms contains a vulnerability in a script, where malicious input created by an attacker may cause improper control of code generation. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.

CVE-2025-33178: NVIDIA NeMo Framework for all platforms contains a vulnerability in the bert services component where malicious data created by an attacker may cause a code injection. A successful exploit of this vulnerability may lead to Code execution, Escalation of privileges, Information disclosure, and Data tampering.

Ref: CVE-2025-33178 in the BERT services component is conceptually similar to CVE-2025-23361 in the LLM pretraining workflow. Both share the same underlying weakness: unsanitized dynamic code generation or execution based on user-controlled input.

Official announcement: Please see the official link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5718

CVE-2025-12863: About the libxml2 XML parsing library (11th Nov 2025)

Preface: Libxml2, a C library for parsing and manipulating XML documents, can be relevant in machine learning contexts when dealing with data stored or exchanged in XML format. While not a machine learning library itself, libxml2, or its Python binding lxml, serves as a foundational tool for data preparation and feature engineering.

Ref: The “difference” between Libxml and Libxml2 is that they are essentially the same thing, with “libxml2” being the official and specific name for the library.

Background: Moving nodes between XML documents can happen in machine learning workflows, especially during data integration, comparison, or transformation.  

When would you be at risk?

You’d be at risk if:

  • You use libxml2 or lxml to move nodes from one document to another (e.g., merging XML trees).
  • The underlying library internally calls xmlSetTreeDoc() during such operations.
  • The original document gets freed while namespace pointers still reference it.

Vulnerability details: A flaw was found in the xmlSetTreeDoc() function of the libxml2 XML parsing library. This function is responsible for updating document pointers when XML nodes are moved between documents. Due to improper handling of namespace references, a namespace pointer may remain linked to a freed memory region when the original document is destroyed. As a result, subsequent operations that access the namespace can lead to a use-after-free condition, causing an application crash.

Official announcement: Please refer to the link for details.

https://www.tenable.com/cve/CVE-2025-12863

https://access.redhat.com/security/cve/cve-2025-12863