Category Archives: AI and ML

CVE-2025-23257 and CVE-2025-23258: About NVIDIA DOCA  (4th Sep 2025)

Preface: An NVIDIA endless “collect-export” loop refers to the standard, continuous operation of the DOCA Telemetry Service (DTS), where telemetry data is perpetually collected and then exported. While high-frequency telemetry (HFT) offers an external, triggered alternative, the standard DTS flow is designed to run indefinitely, collecting data from the Sysfs provider and potentially exporting it via Prometheus or Fluent Bit.

Background: CUDA (Compute Unified Device Architecture) and DOCA (Data Center Infrastructure-on-a-Chip Architecture) are both NVIDIA SDKs, but they serve distinct purposes and target different hardware.

CUDA SDK: Primarily designed for general-purpose computing on NVIDIA GPUs. It enables developers to program accelerated computing applications by leveraging the parallel processing power of GPUs.

DOCA SDK: Built specifically for NVIDIA BlueField Data Processing Units (DPUs) and SuperNICs, aiming to accelerate data center infrastructure tasks. It enables offloading infrastructure-related workloads from the host CPU to the DPU.

DOCA Telemetry Service (DTS) is a DOCA Service for collecting and exporting telemetry data. It can run on hosts and BlueField, collecting data from built-in providers and external telemetry applications. The service supports various providers, including sysfs, ethtool, ifconfig, PPCC, DCGM, NVIDIA SMI, and more.

Ref: The binary data can be read using the /opt/mellanox/collectx/bin/clx_read app, packaged in collectx-clxapidev , a DOCA dependency package.

Vulnerability details:

CVE-2025-23257: NVIDIA DOCA contains a vulnerability in the collectx-clxapidev Debian package that could allow an actor with low privileges to escalate privileges. A successful exploit of this vulnerability might lead to escalation of privileges.

CVE-2025-23258: NVIDIA DOCA contains a vulnerability in the collectx-dpeserver Debian package for arm64 that could allow an attacker with low privileges to escalate privileges. A successful exploit of this vulnerability might lead to escalation of privileges.

Official announcement: Please see the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5655

CVE-2025-23307: NVIDIA NeMo Curator for all platforms contains a vulnerability (28th Aug 2025)

Preface: NeMo Curator, part of the NVIDIA NeMo software suite for managing the AI agent lifecycle, is a Python library specifically designed for fast and scalable data processing and curation for generative AI use cases such as foundation language model pretraining, text-to-image model training, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and parameter-efficient fine-tuning (PEFT).

Background: To install the NeMo Curator library, run the following command:

  • git clone https://github[.]com/NVIDIA/NeMo-Curator[.]git
  • cd NeMo-Curator
  • pip install –extra-index-url https://pypi[.]nvidia[.]com “.[cuda12x]”

Data download: downloading pipeline in NeMo Curator consists of the following classes:

  • DocumentDownloader: Abstract class for downloading remote data to disk.
  • DocumentIterator: Abstract class for reading dataset raw records from the disk.
  • DocumentExtractor: Abstract class for extracting text records, as well as any relevant metadata from the records on the disk.

Vulnerability details: NVIDIA NeMo Curator for all platforms contains a vulnerability where a malicious file created by an attacker could allow code injection. A successful exploit of this vulnerability might lead to code execution, escalation of privileges, information disclosure, and data tampering.

Ref: The vulnerability arises when malicious files—such as JSONL files—are loaded by NeMo Curator. If these files are crafted to exploit weaknesses in how NeMo Curator parses or processes them, they can inject executable code. This aligns with your description of:

  • Embedded malicious payloads in JSONL files.
  • JSON injection attacks exploiting parsing logic.

Official announcement: Please see the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5690

CVE-2025-5115: About Eclipse Jetty (22nd Aug 2025)

Published: 2025-08-20

Updated: 2025-08-19

Preface: Apache Knox uses Eclipse Jetty as its embedded web server. When you deploy and execute Apache Knox, it uses Jetty to handle incoming HTTP requests and provide its various features, such as authentication, authorization, and routing to backend Hadoop services.

Ref: Is Hadoop used in AI? Hadoop ecosystems help with the processing of data and model training operations for machine learning applications.

Background: How Jetty Consumes Resources? Apache Knox leverages Eclipse Jetty as its embedded web server. Apache Knox is a reverse proxy and API Gateway that provides a single point of secure access for Apache Hadoop services. It is written in Java and relies heavily on Java for its runtime environment and functionality.

Is the Exploit Related to HTTP Response Buffer Size?

Not directly. The vulnerability does not exploit the size of the HTTP response buffer itself. Instead, it targets the processing logic of incoming HTTP/2 frames. However:

  • If Jetty is configured with large buffers or many concurrent streams, the impact of the exploit can be amplified.
  • The server may allocate response buffers unnecessarily if it begins processing a request before realizing it’s invalid.

Vulnerability details: In Eclipse Jetty, versions <=9.4.57, <=10.0.25, <=11.0.25, <=12.0.21, <=12.1.0.alpha2, an HTTP/2 client may trigger the server to send RST_STREAM frames, for example by sending frames that are malformed or that should not be sent in a particular stream state, therefore forcing the server to consume resources such as CPU and memory.

Official announcement: Please see the link for details –

https://www.tenable.com/cve/CVE-2025-5115

https://github.com/jetty/jetty.project/pull/13449

Overview of Transformer-based language models (19-08-2025)

Technical Highlights: Megatron-LM codebase efficiently trains models from 2 billion to 462 billion parameters across thousands of GPUs, achieving up to 47% Model FLOP Utilization (MFU) on H100 clusters. The Megatron-LM codebase has successfully benchmarked the training of a 462B parameter model using 6144 H100 GPUs, achieving up to 47% Model FLOP Utilization (MFU).

GPT-4, the latest iteration in OpenAI’s Generative Pre-trained Transformer series, significantly scales up the parameter count compared to its predecessors. While GPT-2 had 1.5 billion parameters and GPT-3 boasted 175 billion, GPT-4 is estimated to have a staggering 1.76 trillion parameters.

Parameters in Artificial Intelligence:

Parameters in AI are the variables that the model learns during training. They are the internal variables that the model uses to make predictions or decisions. In a neural network, the parameters include the weights and biases of the neurons. Parameters are used in AI to determine the output of the model for a given input. During training, the model adjusts its parameters to minimize the difference between its predictions and the actual values. This is typically done using an optimization algorithm, such as gradient descent. Gradient descent is a fundamental optimization algorithm in artificial intelligence, particularly in machine learning and deep learning. It’s used to minimize a function, often the cost function of a model, by iteratively adjusting the model’s parameters in the direction of the steepest descent.

About Megatron-LM: The Megatron-LM codebase, developed by NVIDIA, is widely used for training large parameter models, particularly Large Language Models (LLMs), due to its specialized features and optimizations designed for large-scale distributed training. Megatron-LM is a GPU-optimized framework developed by NVIDIA for training transformer models at scale. It supports models ranging from a few billion to hundreds of billions of parameters.

Outlines the core techniques:

  • Intra-layer model parallelism
  • Pipeline parallelism
  • Tensor parallelism
  • Efficient communication primitives using NCCL
  • Mixed precision training (FP16/BF16)

Ref: NVIDIA Collective Communications Library (NCCL), is a library developed by NVIDIA that provides optimized routines for multi-GPU and multi-node communication. It’s designed to accelerate collective communication patterns, like all-reduce, broadcast, reduce, and all-gather, which are crucial for deep learning frameworks and other parallel computing applications using NVIDIA GPUs.

Ref: FP16 and BF16 are both 16-bit floating-point formats used in AI training to improve performance and efficiency, but they differ in their dynamic range and precision. FP16, also known as half precision, offers higher precision for smaller values but has a limited range. BF16, or Brain Floating Point, has a wider dynamic range, making it more suitable for large-scale models where numerical stability is crucial, even at the cost of some precision.

Official details: For details, please refer to the link – https://github.com/NVIDIA/Megatron-LM

CVE-2025-23305 and CVE-2025-23306: About NVIDIA Megatron-LM (18-08-2025)

Official Updated 08/11/2025 06:16 AM

Preface: GPT-4 offers several key benefits, including improved accuracy, longer context handling, and the ability to process both text and image inputs. It also exhibits stronger guardrails, leading to more reliable and ethical outputs. Additionally, GPT-4 excels in various tasks like professional and academic benchmarks, creative writing, and adapting to user needs.

Background: The Megatron-LM codebase is a framework for training large, powerful transformer language models at scale, developed by NVIDIA. It focuses on efficient, model-parallel (tensor and pipeline) and multi-node pre-training of transformer-based models like GPT, BERT, and T5 using mixed precision

Megatron-LM codebase efficiently trains models from 2B to 462B parameters across thousands of GPUs, achieving up to 47% Model FLOP Utilization (MFU) on H100 clusters.

GPT-4, the latest iteration in OpenAI’s Generative Pre-trained Transformer series, significantly scales up the parameter count compared to its predecessors. While GPT-2 had 1.5 billion parameters and GPT-3 boasted 175 billion, GPT-4 is estimated to have a staggering 1.76 trillion parameters.

The Megatron-LM codebase has successfully benchmarked the training of a 462B parameter model using 6144 H100 GPUs, achieving up to 47% Model FLOP Utilization (MFU).

While this demonstrates the capability of the Megatron-LM framework to train very large models on H100 clusters, the exact number of H100 GPUs used to train GPT-4 is not publicly disclosed. GPT-4 was developed by OpenAI, and they have not released the specific hardware configurations used for its training.

Vulnerability details:

CVE-2025-23305       NVIDIA Megatron-LM for all platforms contains a vulnerability in the tools component, where an attacker may exploit a code injection issue. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.

CVE-2025-23306       NVIDIA Megatron-LM for all platforms contains a vulnerability in the megatron/training/

arguments.py component where an attacker could cause a code injection issue by providing a malicious input. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.

Official announcement: For more information, please refer to the link

https://nvidia.custhelp.com/app/answers/detail/a_id/5685

CVE-2025-23298: About NVIDIA Merlin Transformers4Rec (15th Aug 2025)

Official Updated 08/11/2025 06:15 AM

Preface: While the Bible doesn’t specifically mention artificial intelligence, it reminds us that human knowledge and capabilities will increase dramatically in the last days (Daniel 12:4). Building and training neural networks is a cornerstone of modern artificial intelligence, enabling breakthroughs in fields such as computer vision, natural language processing, and robotics.

Background: NVIDIA Merlin Transformers4Rec is a Python library designed for building sequential and session-based recommender systems, leveraging the power of Transformer architectures, particularly for use with PyTorch. It is part of the broader NVIDIA Merlin ecosystem, which provides end-to-end GPU-accelerated solutions for recommender systems.

Transformers4Rec is pre-installed in the merlin-pytorch container that is available from the NVIDIA GPU Cloud (NGC) catalog.

NVIDIA Merlin PyTorch container, available on NVIDIA NGC (NVIDIA GPU Cloud), includes the necessary components for GPU acceleration, including the CUDA Toolkit.

The Merlin PyTorch container allows users to do preprocessing and feature engineering with NVTabular, and then train a deep-learning based recommender system model with PyTorch, and serve the trained model on Triton Inference Server.

Ref: NVTabular and RAPIDS (cuDF/cuML) for preprocessing and feature engineering.

Vulnerability details: NVIDIA Merlin Transformers4Rec for all platforms contains a vulnerability in a python dependency, where an attacker could cause a code injection issue. A successful exploit of this vulnerability might lead to code execution, escalation of privileges, information disclosure, and data tampering.

Official announcement: Please see the link for details

https://nvidia.custhelp.com/app/answers/detail/a_id/5683

CVE-2025-23294: NVIDIA WebDataset for all platforms contains a vulnerability 14-08-2025

Official Updated 08/11/2025 06:15 AM

Preface: WebDataset is a PyTorch IterableDataset implementation designed for efficient access to large datasets stored in POSIX tar archives. It focuses on sequential/streaming data access, which offers substantial performance advantages in environments where local storage is limited or I/O bottlenecks are a concern. WebDataset is particularly well-suited for very large-scale training, as it minimizes the need for local storage and allows for efficient data loading from various sources, including cloud storage.

Background: NVIDIA WebDataset refers to the integration of WebDataset with NVIDIA technologies like DALI or NeMo, rather than a separate NVIDIA-specific installation. Installing WebDataset itself is straightforward, as it is a Python library.

  • DALI is a portable, open-source software library for decoding and augmenting images, videos, and speech to accelerate deep learning applications.

DALI itself doesn’t extract .tar files directly — instead, it processes data streamed from tarballs via WebDataset or other loaders.

  • NVIDIA NeMo is a framework for building and deploying generative AI models, particularly those used in conversational AI like speech recognition and natural language processing.

It may extract or stream data depending on the configuration, but tarball handling is abstracted behind the data pipeline.

Vulnerability details: CVE-2025-23294 – NVIDIA WebDataset for all platforms contains a vulnerability where an attacker could execute arbitrary code with elevated permissions. A successful exploit of this vulnerability might lead to escalation of privileges, data tampering, information disclosure, and denial of service.

Official announcement: Please see the link for details

https://nvidia.custhelp.com/app/answers/detail/a_id/5658

A safe mode bypass vulnerability in Keras versions 3.0.0 through 3.10.0 (13th Aug 2025)

Preface: Deep learning in AI generally learns much faster than humans in specific, narrow tasks, especially those involving large datasets and complex computations. However, humans still excel at general intelligence, creative problem-solving, and learning with limited data.

Perhaps, AI does not have this advantage yet!

Background: Keras 3.0 is a major rewrite of the Keras deep learning API, designed to provide a unified and flexible platform for building and deploying deep learning models. Its most significant feature is its multi-backend architecture, allowing users to run Keras workflows on top of various popular deep learning frameworks.

TensorFlow is a comprehensive, low-level machine learning framework capable of building and training models directly. However, Keras plays a crucial role as its official high-level API, providing several benefits that make deep learning development significantly easier and more efficient within the TensorFlow ecosystem.

Keras 3.0 does it work in lambda layer? Yes, the Lambda layer continues to be available and functional in Keras 3.0. In machine learning, specifically within the context of deep learning frameworks like Keras or TensorFlow, a Lambda layer is a type of layer that allows you to wrap arbitrary expressions or functions as a layer in your neural network model.

Vulnerability details: A safe mode bypass vulnerability in the `Model.load_model` method in Keras versions 3.0.0 through 3.10.0 allows an attacker to achieve arbitrary code execution by convincing a user to load a specially crafted `.keras` model archive.

Official announcement: Please see the link for details

https://www.tenable.com/cve/CVE-2025-8747

2025-23318 and CVE-2025-23319: About NVIDIA Triton Inference Server (6th Aug 2025)

Preface: Nvidia’s security advisories released on August 4, 2025 (e.g., CVE-2025-23318, CVE-2025-23319) are specifically related to the Python backend. The Triton backend for Python. The goal of Python backend is to let you serve models written in Python by Triton Inference Server without having to write any C++ code.

Background: NVIDIA Triton Inference Server is an open-source inference serving software that streamlines the deployment and execution of AI models from various deep learning and machine learning frameworks. It achieves this flexibility through a modular system of backends. 

Each backend within Triton is responsible for executing models from a specific framework. When an inference request arrives for a particular model, Triton automatically routes the request to the necessary backend for execution. 

Key backend frameworks supported by Triton include:

  • TensorRT: NVIDIA’s high-performance deep learning inference optimizer and runtime.
  • TensorFlow: A popular open-source machine learning framework.
  • PyTorch: Another widely used open-source machine learning library.
  • ONNX: An open standard for representing machine learning models.
  • OpenVINO: Intel’s toolkit for optimizing and deploying AI inference.
  • Python: A versatile backend that can execute models written directly in Python and also serves as a dependency for other backends.
  • RAPIDS FIL (Forest Inference Library): For efficient inference of tree models (e.g., XGBoost, LightGBM, Scikit-Learn).

This modular backend architecture allows Triton to provide a unified serving solution for a wide range of AI models, regardless of the framework they were trained in.

Vulnerability details:

CVE-2025-23318: NVIDIA Triton Inference Server for Windows and Linux contains a vulnerability in the Python backend, where an attacker could cause an out-of-bounds write. A successful exploit of this vulnerability might lead to code execution, denial of service, data tampering, and information disclosure.

CVE-2025-23319: NVIDIA Triton Inference Server for Windows and Linux contains a vulnerability in the Python backend, where an attacker could cause an out-of-bounds write by sending a request. A successful exploit of this vulnerability might lead to remote code execution, denial of service, data tampering, or information disclosure.

Official announcement: Please see the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5687

CVE-2025-23310: The NVIDIA Triton Inference Server for Windows and Linux suffers from a stack buffer overflow due to specially crafted input. (5th Aug 2025)

Preface: The NVIDIA Triton Inference Server API supports both HTTP/REST and GRPC protocols. These protocols allow clients to communicate with the Triton server for various tasks such as model inferencing, checking server and model health, and managing model metadata and statistics.

Background: NVIDIA Triton™ Inference Server, part of the NVIDIA AI platform and available with NVIDIA AI Enterprise, is open-source software that standardizes AI model deployment and execution across every workload.

The Asynchronous Server Gateway Interface (ASGI) is a calling convention for web servers to forward requests to asynchronous-capable Python frameworks, and applications. It is built as a successor to the Web Server Gateway Interface (WSGI).

NVIDIA Triton Inference Server integrates a built-in web server to expose its functionality and allow clients to interact with it. This web server is fundamental to how Triton operates and provides access to its inference capabilities on both Windows and Linux environments.

Vulnerability details: CVE-2025-23310 – NVIDIA Triton Inference Server for Windows and Linux contains a vulnerability where an attacker could cause stack buffer overflow by specially crafted inputs. A successful exploit of this vulnerability might lead to remote code execution, denial of service, information disclosure, and data tampering.

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5687