Category Archives: AI and ML

CVE-2025-33202: About NVIDIA Triton Inference Server (14th Nov 2025)

Official Updated 11/10/2025 05:39 AM

Preface: Clients can communicate with Triton using either an HTTP/REST protocol, a GRPC protocol, or by an in-process C API or its C++ wrapper. Triton supports HTTP/REST and gRPC, both of which involve complex header parsing.

In the context of the Open Inference Protocol (OIP), also known as KServe V2 Protocol. The protocol defines a standardized interface for model inference, which implies that compliant inference servers must be capable of parsing incoming requests and serializing outgoing responses according to the protocol’s defined message formats.

Background: To define a parser that filters the payload for Triton using the KServe V2 (Open Inference Protocol), you need to handle the following:

Key Considerations

1.Protocol Compliance – The parser must understand the OIP message format:

-Inference Request: Includes inputs, outputs, parameters.

-Inference Response: Includes model_name, outputs, parameters.

-Data can be in JSON (for REST) or Protobuf (for gRPC).

2.Filtering Logic – Decide what you want to filter:

-Specific tensor names?

-Certain data types (e.g., FP32, INT64)?

-Large payloads (e.g., skip tensors above a size threshold)?

-Security checks (e.g., reject malformed headers)?

3.Shared Memory Handling – If shared memory is used, the parser should:

-Validate shared_memory_region references.

-Ensure the payload does not redundantly include tensor data when shared memory is specified.

Vulnerability details: NVIDIA Triton Inference Server for Linux and Windows contains a vulnerability where an attacker could cause a stack overflow by sending extra-large payloads. A successful exploit of this vulnerability might lead to denial of service.

Official announcement: Please see the official link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5723

CVE‑2025‑23361 and CVE-2025-33178: NVIDIA Nemo Framework contains vulnerabilities (13th Nov 2025)

Preface: The advantages of using Hydra and OmegaConf for configuration management are flexibility, reproducibility, and scalability. Hydra’s ability to override configurations at runtime from the command line and compose them from multiple sources makes it highly flexible.

NeMo uses Hydra/OmegaConf for configuration management, which supports interpolation and sometimes dynamic evaluation.

Background: NVIDIA NeMo is an end-to-end platform designed for developing and deploying generative AI models. This includes large language models (LLMs), vision language models (VLMs), video models, and speech AI. NeMo offers tools for data curation, fine-tuning, retrieval-augmented generation (RAG), and inference, making it a comprehensive solution for creating enterprise-ready AI models. Here are some key capabilities of NeMo LLMs:

  1. Customization: NeMo allows you to fine-tune pre-trained models to suit specific enterprise needs. This includes adding domain-specific knowledge and skills, and continuously improving the model with reinforcement learning from human feedback (RLHF).
  2. Scalability: NeMo supports large-scale training and deployment across various environments, including cloud, data centers, and edge devices. This ensures high performance and flexibility for different use cases.
  3. Foundation Models: NeMo offers a range of pre-trained foundation models, such as GPT-8, GPT-43, and GPT-530, which can be used for tasks like text classification, summarization, creative writing, and chatbots.
  4. Data Curation: The platform includes tools for processing and curating large datasets, which helps improve the accuracy and relevance of the models.
  5. Integration: NeMo can be integrated with other NVIDIA AI tools and services, providing a comprehensive ecosystem for AI development.

Vulnerability details:

CVE-2025-23361: NVIDIA NeMo Framework for all platforms contains a vulnerability in a script, where malicious input created by an attacker may cause improper control of code generation. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.

CVE-2025-33178: NVIDIA NeMo Framework for all platforms contains a vulnerability in the bert services component where malicious data created by an attacker may cause a code injection. A successful exploit of this vulnerability may lead to Code execution, Escalation of privileges, Information disclosure, and Data tampering.

Ref: CVE-2025-33178 in the BERT services component is conceptually similar to CVE-2025-23361 in the LLM pretraining workflow. Both share the same underlying weakness: unsanitized dynamic code generation or execution based on user-controlled input.

Official announcement: Please see the official link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5718

CVE-2025-12863: About the libxml2 XML parsing library (11th Nov 2025)

Preface: Libxml2, a C library for parsing and manipulating XML documents, can be relevant in machine learning contexts when dealing with data stored or exchanged in XML format. While not a machine learning library itself, libxml2, or its Python binding lxml, serves as a foundational tool for data preparation and feature engineering.

Ref: The “difference” between Libxml and Libxml2 is that they are essentially the same thing, with “libxml2” being the official and specific name for the library.

Background: Moving nodes between XML documents can happen in machine learning workflows, especially during data integration, comparison, or transformation.  

When would you be at risk?

You’d be at risk if:

  • You use libxml2 or lxml to move nodes from one document to another (e.g., merging XML trees).
  • The underlying library internally calls xmlSetTreeDoc() during such operations.
  • The original document gets freed while namespace pointers still reference it.

Vulnerability details: A flaw was found in the xmlSetTreeDoc() function of the libxml2 XML parsing library. This function is responsible for updating document pointers when XML nodes are moved between documents. Due to improper handling of namespace references, a namespace pointer may remain linked to a freed memory region when the original document is destroyed. As a result, subsequent operations that access the namespace can lead to a use-after-free condition, causing an application crash.

Official announcement: Please refer to the link for details.

https://www.tenable.com/cve/CVE-2025-12863

https://access.redhat.com/security/cve/cve-2025-12863

Security Bulletin:  About NVIDIA RunAI – CVE-2025-33176 (6th Nov 2025)

Preface: NVIDIA Run:ai is a Kubernetes-native platform for managing and optimizing AI workloads, acquired by NVIDIA in 2024. It provides dynamic orchestration for GPU resources, supporting flexible resource allocation to improve resource utilization and accelerate the AI ​​development lifecycle in hybrid, on-premises, and cloud environments. Prior to its acquisition by NVIDIA in December 2024, Run:ai was an Israeli software startup founded in 2018, focused on AI infrastructure management. Its core product is Kubernetes-based software for managing and optimizing GPU workloads for AI applications, helping enterprises utilize hardware more efficiently. The company’s software assists enterprises in managing GPU resources in on-premises, cloud, and hybrid environments. Run:ai does not “change its role in machine learning,” but rather focuses on the key challenges of AI workload management and GPU orchestration throughout the broader Machine Learning Operations (MLOps) lifecycle.

Background: NVIDIA Run:ai is a Kubernetes-based platform that acts as a GPU orchestrator to maximize the efficiency of AI and machine learning workloads. It addresses challenges like managing expensive GPU resources by enabling dynamic, policy-based scheduling, allowing for the sharing of GPUs across teams and projects, and optimizing workload performance for training, tuning, and inference. Run:ai integrates into existing hybrid or on-premises AI infrastructure to improve GPU utilization and accelerate AI development cycles.

NVIDIA DGX Cloud primarily leverages Kubernetes (K8s) for orchestrating and managing AI workloads.

While DGX systems historically used Docker for containerization, DGX Cloud, particularly for advanced AI workloads and resource management, relies on Kubernetes for its scalability, high-performance computing capabilities, and efficient GPU resource allocation. This is often integrated with other NVIDIA software like NVIDIA NeMo and NVIDIA Run:ai, and deployed on cloud services such as Amazon Elastic Kubernetes Service (Amazon EKS).

Vulnerability details: NVIDIA RunAI for all platforms contains a vulnerability where a user could cause an improper restriction of communications channels on an adjacent network. A successful exploit of this vulnerability might lead to escalation of privileges, data tampering, and information disclosure.

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5719

AMD ID: AMD-SB-7055RDSEED Failure on AMD “Zen 5” Processors(27th-10-2025)

Preface: The main consequence of an RDSEED failure on AMD Zen 5 processors is instability, crashes, and potentially corrupted data, as this issue affects the processor’s ability to generate high-quality random numbers for cryptography and other sensitive tasks. This has led to the development of Linux kernel patches to temporarily disable RDSEED on affected Zen 5 CPUs until AMD provides a permanent hardware or firmware fix.

Background: RDSEED is a CPU instruction that provides high-entropy random numbers directly from a hardware entropy source, such as the Intel Digital Random Number Generator. It is designed to be used to seed other pseudo-random number generators (PRNGs) for cryptographic applications, ensuring a secure and unpredictable starting point.

RDSEED is a CPU instruction that provides high-entropy random numbers directly from a hardware entropy source, such as the Intel Digital Random Number Generator. It is designed to be used to seed other pseudo-random number generators (PRNGs) for cryptographic applications, ensuring a secure and unpredictable starting point.

Vulnerability Details: AMD was notified of a bug in “Zen 5” processors that may cause the RDSEED instruction to return 0 at a rate inconsistent with randomness while incorrectly signaling success (CF=1), indicating a potential misclassification of failure as success. This issue was initially reported publicly via the Linux kernel mailing list and was not submitted through AMD’s Coordinated Vulnerability Disclosure (CVD) process.

AMD has determined that the 16-bit and 32-bit forms of the RDSEED instruction on “Zen 5” processors are affected. The 64-bit form of RDSEED is not affected. AMD plans to release mitigations for this vulnerability.

Official announcement: Please refer to the link for details –

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7055.html

Security Bulletin: NVIDIA ConnectX and BlueField (CVE‑2025-23299) – October 2025 (24th Oct 2025)

Preface: Nvidia BlueField is a line of data processing units (DPUs) designed and produced by Nvidia. Initially developed by Mellanox Technologies. DOCA is a consistent and essential resource across all existing and future generations of BlueField DPU and SuperNIC products.

Background: The NVIDIA cloud-native supercomputing platform leverages the NVIDIA BlueField DPU architecture with high-speed, low-latency. The DPU enables native cloud services that let multiple users securely share resources without loss in application performance. HPC and AI communication frameworks and libraries play a critical role in determining application performance. Due to their latency and bandwidth-sensitive nature, offloading the libraries from the host CPU or GPU to the BlueField DPU creates the highest degree of overlap for parallel progression of communication and computation. DOCA is a consistent and essential resource across all existing and future generations of BlueField DPU and SuperNIC products.

DOCA BlueMan dashboard is the web-based interface for managing and monitoring an NVIDIA BlueField DPU (Data Processing Unit).

Vulnerability details: NVIDIA Bluefield and ConnectX contain a vulnerability in the management interface that may allow a malicious actor with high privilege access to execute arbitrary code.

Reference:

While Python itself is memory-safe, the real risk comes from:

  • YAML parsing libraries (like PyYAML) that allow arbitrary object deserialization.
  • C-based extensions or native bindings used by Python that may not enforce memory safety.
  • Improper validation of YAML configuration passed into privileged services like DTS.

Official announcement: Please see the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5684

CVE-2025-33182 and CVE-2025-33177: About NVIDIA Jetson Linux and IGX OS (20-10-2025)

Official Updated 10/13/2025 09:19 AM

Preface: Railway applications have traditionally relied on fixed-function embedded computers to perform tasks such as signaling, monitoring, and train control. To bridge this gap, rail operators and system integrators are turning to AI-driven edge computing to meet the growing demand for real-time processing and automation.

Background: The Nvidia Jetson is not just a CPU; it is a complete embedded computing board with both a CPU and a powerful GPU, memory, and other components on a single module. It is a System on Module (SoM) designed for AI and machine learning applications at the edge.

CPU: The Jetson modules contain an ARM-based CPU for general-purpose processing.

GPU: A key feature is the integrated GPU with CUDA cores, which is specialized for parallel processing and AI tasks.

The NVIDIA Jetson Linux Driver Package includes a UEFI-based bootloader. This bootloader is the standard firmware for newer Jetson platforms like Orin and AGX Xavier, replacing the older CBoot system. The UEFI firmware is included with the Linux kernel, drivers, and a root filesystem for the Jetson platform. 

Component of the driver package: The UEFI bootloader is a standard part of the Jetson Linux Driver Package, alongside the Linux kernel, drivers, and utilities.

Support for modern platforms: Support for the UEFI bootloader is included in recent releases of Jetson Linux, such as R35.6.0 and later, for platforms like Jetson AGX Orin, Orin NX, Orin Nano, and others.

Vulnerability details:

CVE-2025-33182: NVIDIA Jetson Linux contains a vulnerability in UEFI, where improper authentication may allow a privileged user to cause corruption of the Linux Device Tree. A successful exploitation of this vulnerability might lead to data tampering, denial of service.

CVE-2025-33177: NVIDIA Jetson Linux and IGX OS contain a vulnerability in NvMap, where improper tracking of memory allocations could allow a local attacker to cause memory overallocation. A successful exploitation of this vulnerability might lead to denial of service.

Official announcement: Please refer to the url for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5716

CVE-2025-23356: about Isaac Lab component of NVIDIA Isaac Sim (16-10-2025)

Preface: The goal of generating synthetic data for robot models is to create a diverse and realistic dataset for training and validating AI systems in a cost-effective and scalable way, helping to overcome the limitations of real-world data collection. This includes creating data for training models, improving their performance, testing for edge cases, and refining them after initial training without needing extensive, time-consuming, or dangerous physical data collection.  

Background: Isaac Sim facilitates three essential workflows: generating synthetic data for training or post-training robot models used for perception, mobility, and manipulation. It also enables validating robot stacks through software and hardware-in-loop testing and enabling robot learning through Isaac™ Lab.

NVIDIA Isaac Lab is an open-source, unified framework for robot learning that helps developers train robot policies using high-fidelity simulation. Built on NVIDIA Isaac Sim and the Omniverse platform, it leverages the power of GPUs for parallel physics simulation and photorealistic rendering to bridge the gap between simulation and real-world training. The framework simplifies common workflows for robot learning, such as reinforcement learning and imitation learning, by providing modular design patterns and a unified set of tools.

Configuring Stable-Baselines3 (SB3) within Isaac Sim, particularly with Isaac Lab, involves setting up the training environment and specifying hyperparameters for your chosen reinforcement learning algorithm.

Vulnerability details: NVIDIA Isaac Lab contains a vulnerability in SB3 configuration parsing. A successful exploit of this vulnerability might lead to code execution, denial of service, escalation of privileges, information disclosure, or data tampering.

Official announcement: Please see the link for details

https://nvidia.custhelp.com/app/answers/detail/a_id/5708

CVE-2025-23272: About NVIDIA nvJPEG library (6th Oct 2025)

Preface: The nvJPEG library provides low-latency decoding, encoding, and transcoding for common JPEG formats used in computer vision applications such as image classification, object detection and image segmentation.

Background: The nvJPEG library enables the following functions: use the JPEG image data stream as input; retrieve the width and height of the image from the data stream, and use this retrieved information to manage the GPU memory allocation and the decoding.

To use the nvJPEG library, start by calling the helper functions for initialization. Create nvJPEG library handle with one of the helper functions nvjpegCreateSimple() or nvjpegCreateEx() . Create JPEG state with the helper function nvjpegJpegStateCreate() . See nvJPEG Type Declarations and nvjpegJpegStateCreate() .

The nvJPEG library provides high-performance, GPU accelerated JPEG decoding functionality for image formats commonly used in deep learning and hyperscale multimedia applications.

Ref: Arrays in C/C++ are zero-indexed, meaning that if an array has `n` elements, valid indices range from `0` to `n-1`. Accessing an index outside this range leads to out-of-bounds access. Pointers in C/C++ provide direct memory manipulation capabilities, but this power comes with the risk of “out-of-bounds” access.

Vulnerability details: NVIDIA nvJPEG library contains a vulnerability where an attacker can cause an out-of-bounds read by means of a specially crafted JPEG file. A successful exploit of this vulnerability might lead to information disclosure or denial of service.

Official announcement: For more details, please click the link.

https://nvd.nist.gov/vuln/detail/CVE-2025-23272

CVE-2025-10657: About Enhanced Container Isolation (2nd Oct 2025)

Preface: Standardized AI/ML model packaging: With OCI artifacts, models can be versioned, distributed, and tracked like container images. This promotes consistency and traceability across environments.Docker Desktop, specifically through its Docker Model Runner feature, can be used to run various AI models, particularly Large Language Models (LLMs) and other AI models that can be packaged as OCI Artifacts.

OCI Artifacts are any arbitrary files associated with software applications, extending the standardized OCI (Open Container Initiative) image format to include content beyond container images, such as Helm charts, Software Bill of Materials (SBOMs), digital signatures, and provenance data. These artifacts leverage the same fundamental OCI structure of manifest, config, and layers and are stored and distributed using OCI-compliant registries and tools like the ORAS CLI.

Background: A container desktop, such as Docker Desktop, acts as a local development environment and a management host for CI/CD pipelines by providing consistent, isolated environments for building, testing, and deploying containerized applications. It enables developers to package applications with their dependencies into portable containers, eliminating “works on my machine” issues and ensuring application uniformity across development, testing, and production. This simplifies the entire software delivery process, accelerating the development lifecycle by integrating container management directly into the developer’s workflow.

Vulnerability details: In a hardened Docker environment, with Enhanced Container Isolation ( ECI https://docs.docker.com/enterprise/security/hardened-desktop/enhanced-container-isolation/ ) enabled, an administrator can utilize the command restrictions feature https://docs.docker.com/enterprise/security/hardened-desktop/enhanced-container-isolation/config/#command-restrictions  to restrict commands that a container with a Docker socket mount may issue on that socket. Due to a software bug, the configuration to restrict commands was ignored when passed to ECI, allowing any command to be executed on the socket. This grants excessive privileges by permitting unrestricted access to powerful Docker commands. The vulnerability affects only Docker Desktop 4.46.0 users that have ECI enabled and are using the Docker socket command restrictions feature. In addition, since ECI restricts mounting the Docker socket into containers by default, it only affects containers which are explicitly allowed by the administrator to mount the Docker socket.

Official announcement: For more details, please see the link –

https://nvd.nist.gov/vuln/detail/CVE-2025-10657