Category Archives: AI and ML

2025-23318 and CVE-2025-23319: About NVIDIA Triton Inference Server (6th Aug 2025)

Preface: Nvidia’s security advisories released on August 4, 2025 (e.g., CVE-2025-23318, CVE-2025-23319) are specifically related to the Python backend. The Triton backend for Python. The goal of Python backend is to let you serve models written in Python by Triton Inference Server without having to write any C++ code.

Background: NVIDIA Triton Inference Server is an open-source inference serving software that streamlines the deployment and execution of AI models from various deep learning and machine learning frameworks. It achieves this flexibility through a modular system of backends. 

Each backend within Triton is responsible for executing models from a specific framework. When an inference request arrives for a particular model, Triton automatically routes the request to the necessary backend for execution. 

Key backend frameworks supported by Triton include:

  • TensorRT: NVIDIA’s high-performance deep learning inference optimizer and runtime.
  • TensorFlow: A popular open-source machine learning framework.
  • PyTorch: Another widely used open-source machine learning library.
  • ONNX: An open standard for representing machine learning models.
  • OpenVINO: Intel’s toolkit for optimizing and deploying AI inference.
  • Python: A versatile backend that can execute models written directly in Python and also serves as a dependency for other backends.
  • RAPIDS FIL (Forest Inference Library): For efficient inference of tree models (e.g., XGBoost, LightGBM, Scikit-Learn).

This modular backend architecture allows Triton to provide a unified serving solution for a wide range of AI models, regardless of the framework they were trained in.

Vulnerability details:

CVE-2025-23318: NVIDIA Triton Inference Server for Windows and Linux contains a vulnerability in the Python backend, where an attacker could cause an out-of-bounds write. A successful exploit of this vulnerability might lead to code execution, denial of service, data tampering, and information disclosure.

CVE-2025-23319: NVIDIA Triton Inference Server for Windows and Linux contains a vulnerability in the Python backend, where an attacker could cause an out-of-bounds write by sending a request. A successful exploit of this vulnerability might lead to remote code execution, denial of service, data tampering, or information disclosure.

Official announcement: Please see the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5687

CVE-2025-23310: The NVIDIA Triton Inference Server for Windows and Linux suffers from a stack buffer overflow due to specially crafted input. (5th Aug 2025)

Preface: The NVIDIA Triton Inference Server API supports both HTTP/REST and GRPC protocols. These protocols allow clients to communicate with the Triton server for various tasks such as model inferencing, checking server and model health, and managing model metadata and statistics.

Background: NVIDIA Triton™ Inference Server, part of the NVIDIA AI platform and available with NVIDIA AI Enterprise, is open-source software that standardizes AI model deployment and execution across every workload.

The Asynchronous Server Gateway Interface (ASGI) is a calling convention for web servers to forward requests to asynchronous-capable Python frameworks, and applications. It is built as a successor to the Web Server Gateway Interface (WSGI).

NVIDIA Triton Inference Server integrates a built-in web server to expose its functionality and allow clients to interact with it. This web server is fundamental to how Triton operates and provides access to its inference capabilities on both Windows and Linux environments.

Vulnerability details: CVE-2025-23310 – NVIDIA Triton Inference Server for Windows and Linux contains a vulnerability where an attacker could cause stack buffer overflow by specially crafted inputs. A successful exploit of this vulnerability might lead to remote code execution, denial of service, information disclosure, and data tampering.

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5687

The whole world is paying attention to Nvidia, but supercomputers using AMD are the super ones! (July 28, 2025)

Preface: The El Capitan system at the Lawrence Livermore National Laboratory, California, USA remains the No. 1 system on the TOP500. The HPE Cray EX255a system was measured with 1.742 Exaflop/s on the HPL benchmark. El Capitan has 11,039,616 cores and is based on AMD 4th generation EPYC™ processors with 24 cores at 1.8 GHz and AMD Instinct™ MI300A accelerators. It uses the HPE Slingshot interconnect for data transfer and achieves an energy efficiency of 58.9 Gigaflops/watt. The system also achieved 17.41 Petaflop/s on the HPCG benchmark which makes it the new leader on this ranking as well. June 2025

Background: Does El Capitan Use Docker or Kubernetes? El Capitan does not use Docker directly, but it does use Kubernetes—specifically:

Kubernetes is deployed on Rabbit and worker nodes. It is part of a stateless orchestration layer integrated with the Tri-Lab Operating System Stack (TOSS).

Kubernetes is used alongside Flux (the resource manager) and Rabbit (the near-node storage system) to manage complex workflows.

Why Kubernetes Instead of Docker Alone?

While Docker is lightweight and flexible, Kubernetes offers orchestration, which is critical for:

  • Managing thousands of concurrent jobs.
  • Coordinating data movement and storage across Rabbit nodes.
  • Supporting AI/ML workflows and in-situ analysis.

But Kubernetes has a larger memory and CPU footprint than Docker alone.

Technical details: HPE Cray Operating System (COS) is a specialized version of SUSE Linux Enterprise Server designed for high-performance computing, rather than being a variant of Red Hat Enterprise Linux. It’s built to run large, complex applications at scale and enhance application efficiency, reliability, management, and data access. While COS leverages SUSE Linux, it incorporates features tailored for supercomputing environments, such as enhanced memory sharing, power monitoring, and advanced kernel debugging.

What Does Cray Modify?
Cray (now part of HPE) primarily modifies:
-The Linux kernel for performance tuning, scalability, and hardware support
-Adds HPC-specific enhancements, such as:
Optimized scheduling
NUMA-aware memory management
High-speed interconnect support (e.g., Slingshot)
Enhanced I/O and storage stack
-Integrates with Cray Shasta architecture and Slingshot interconnect

These modifications are layered on top of SUSE Linux, meaning the base OS remains familiar and enterprise-grade, but is tailored for supercomputing.

End.

Our world is full of challenges and hardships. But you must be happy every day!

Security Focus: CVE‑2025‑23284 NVIDIA vGPU software contains a vulnerability (25-07-2025)

Preface: Memory Allocation Flow:

  1. User-space request (e.g., CUDA malloc or OpenGL buffer allocation).
  2. Driver calls memmgrCreateHeap_IMPL() to create a memory heap.
  3. Heap uses pmaAllocatePages() to get physical memory.
  4. Virtual address space is mapped using UVM or MMU walker.
  5. Memory is returned to user-space or GPU context.

Background:

An OS-agnostic binary is a compiled program designed to run on multiple operating systems without requiring separate builds for each. This means the binary file can be executed on different OS platforms without modification, achieving a level of portability that’s not common with traditional compiled software.

The core loadable module within the NVIDIA vGPU software package is the NVIDIA kernel driver, specifically named nvidia[.]ko. This module facilitates communication between the guest virtual machine (VM) and the physical NVIDIA GPU. It’s split into two main components: an OS-agnostic binary and a kernel interface layer. The OS-agnostic component, for example, nv-kernel[.]o_binary for the nvidia[.]ko module, is provided as a pre-built binary to save time during installation. The kernel interface layer is specific to the Linux kernel version and configuration.

Vulnerability details:

CVE-2025-23285: NVIDIA vGPU software contains a vulnerability in the Virtual GPU Manager, where a malicious guest could cause a stack buffer overflow. A successful exploit of this vulnerability might lead to code execution, denial of service, information disclosure, or data tampering.

CVE2025-23283: NVIDIA vGPU software for Linux-style hypervisors contains a vulnerability in the Virtual GPU Manager, where a malicious guest could cause stack buffer overflow. A successful exploit of this vulnerability might lead to code execution, denial of service, escalation of privileges, information disclosure, or data tampering.

Official announcement: Please see the url for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5670

CVE-2023-4969 – Researchers from Trail of Bits reported a potential vulnerability, titled “LeftoverLocals.”, actually this GPU design weakness are fickle! (21-07-2025)

Preface: “LeftoverLocals” allows recovery of data from GPU local memory created by other processes on Apple, Qualcomm, AMD, and Imagination GPUs. LeftoverLocals affects the security posture of the entire GPU application, especially LLM and machine learning models running on affected GPU platforms. NVD published on January 16, 2024. So far, AMD appears to be the only company actively taking remediation measures.

Background: Researchers from Trail of Bits reported a potential vulnerability, titled “LeftoverLocals” article to public on 16th January 2024. The corrective action was taken by AMD in following schedule.

2025-07-18: Updated the Mitigation section for AMD Radeon Graphics

2025-06-23: Updated the Mitigation section for Data Center Graphics, AMD Radeon Graphics, and revised Client Processors table

2025-04-07: Updated the Mitigation section for Data Center Graphics, AMD Radeon Graphics, and Client Processors

2025-02-11: Updated the Mitigation section – Data Center Graphics

2025-01-15: Mitigation section has been updated and AMD Ryzen™ AI 300 Series Processor (Formerly codenamed) “Strix Point” FP8 has been added to the Client Processors list

2024-11-07: Mitigation has been updated for MI300 and MI300A

Updated driver version from 24.x.y to 25.x.y

2024-10-30: Updated mitigation targets

2024-08-02: Updated AMD Software: Adrenalin Edition and PRO Edition versions.

Removed: AMD Ryzen™ 3000 Series Processors with Radeon™ Graphics (Not affected)

Added: AMD Ryzen™ 8000 Series Processors with Radeon™ Graphics and AMD Ryzen™ 7030 Series Processors with Radeon™ Graphics

2024-07-30: Updated the Mitigation section of AMD RadeonTM Graphics and Client processors product tables

Updated Data Center Graphics Inter-VM and Bare Metal/Intra-VM Mitigation product tables

Updated mitigation section month for driver update rollout

2024-05-07: Added Vega products and Mitigation section with Product tables

2024-01-26: Updated Graphics and Data Center Graphics products

2024-01-16: Initial publication

Vulnerability details: CVE-2023-4969: A GPU kernel can read sensitive data from another GPU kernel (even from another user or app) through an optimized GPU memory region called _local memory_ on various architectures.

Official announcement: Please refer to the official link for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-6010.html

Remark: In step 5, CU2 is written incorrectly. The correct word should be CU.

CVE-2025-23270: NVIDIA Jetson Linux contains a vulnerability in UEFI Management mode (20th July 2025)

Preface: To enter UEFI Management mode on a Jetson device, you’ll typically need to access it during the boot process by pressing a specific key (like F2, F10, or Del) before the OS starts loading. Once in UEFI, you can configure settings related to booting, such as boot order and device selection.

Background: CUDA is a parallel computing platform and programming model developed by NVIDIA, designed to leverage the power of GPUs for general-purpose computing. Linux for Tegra (L4T) is NVIDIA’s customized Linux distribution based on Ubuntu, optimized for their Tegra family of system-on-chips (SoCs), including those used in Jetson development kits. Essentially, L4T provides the operating system and necessary drivers for running CUDA-enabled applications on NVIDIA’s embedded platforms.

NVIDIA Jetson Linux is a customized version of the Linux operating system specifically designed for NVIDIA Jetson embedded computing modules. It provides a complete software stack, including the Linux kernel, bootloader, drivers, and libraries, tailored for the Jetson platform’s hardware and intended for edge AI and robotics applications.

Vulnerability details:

CVE-2025-23270 NVIDIA Jetson Linux contains a vulnerability in UEFI Management mode, where an unprivileged local attacker may cause exposure of sensitive information via a side channel vulnerability. A successful exploit of this vulnerability might lead to code execution, data tampering, denial of service, and information disclosure.

CVE-2025-23269 NVIDIA Jetson Linux contains a vulnerability in the kernel where an attacker may cause an exposure of sensitive information due to a shared microarchitectural predictor state that influences transient execution. A successful exploit of this vulnerability may lead to information disclosure.

Official announcement: Please see the link for details

https://nvidia.custhelp.com/app/answers/detail/a_id/5662

“When error occurs, the data remaining on cache memory. When OS started, a malicious program stored in device then executes read on shared memory.”

CVE-2025-23266 and CVE-2025-23266: NVIDIA Container Toolkit design weakness (16-07-2025)

Preface: Docker Compose is a tool that makes it easier to define and manage multi-container Docker applications. It simplifies running interconnected services, such as a frontend, backend API, and database, by allowing them to be launched and controlled together.

Docker Compose is a utility for defining and running multi-container Docker applications. Furthermore, Docker Compose responsible manages the container lifecycle. Container lifecycle management is a critical process of overseeing the creation, deployment, and operation of a container until its eventual decommissioning.

Background: Docker Compose v2.30.0 has introduced lifecycle hooks, making it easier to manage actions tied to container start and stop events. This feature lets developers handle key tasks more flexibly while keeping applications clean and secure.

Vulnerability details:

CVE-2025-23266: NVIDIA Container Toolkit for all platforms contains a vulnerability in some hooks used to initialize the container, where an attacker could execute arbitrary code with elevated permissions. A successful exploit of this vulnerability might lead to escalation of privileges, data tampering, information disclosure, and denial of service.

CVE-2025-23267: NVIDIA Container Toolkit for all platforms contains a vulnerability in the update-ldcache hook, where an attacker could cause a link following by using a specially crafted container image. A successful exploit of this vulnerability might lead to data tampering and denial of service.

Official announcement: Please refer to url for details

https://nvidia.custhelp.com/app/answers/detail/a_id/5659

Ref: Does Disabling Hooks Disable Container Lifecycle Management?

Hooks – In this context, hooks are scripts or binaries that run during container lifecycle events (e.g., prestart, poststart). The CUDA compatibility hook injects libraries or environment variables needed for CUDA apps.

Disabling the Hook – Prevents the automatic injection of CUDA compatibility libraries into containers. This does not disable the entire container lifecycle, but it removes one automation step in the lifecycle.

CVE-2025-53818: Command Injection in MCP Server github-kanban-mcp-server (15th July 2025)

Preface: Does it good when artificial Intelligence use Open Source software? Yes, using open-source software is generally considered a positive aspect for artificial intelligence development. It fosters collaboration, transparency, and faster innovation, while also potentially reducing costs and biases. However, it’s crucial to acknowledge potential risks like misuse and the need for responsible development practices.

Background: The Model Context Protocol (MCP) is an open standard, open-source framework designed to standardize how AI models, particularly large language models (LLMs), interact with external tools, systems, and data sources. Think of it as a universal adapter, similar to USB-C, for AI applications, allowing them to easily connect to and utilize various data and tools.

A Kanban MCP Server is a server component that manages Kanban boards using the Model Context Protocol (MCP). It allows AI assistants and other systems to interact with and manipulate Kanban boards programmatically, enabling automation and integration of workflows.

Vulnerability details: GitHub Kanban MCP Server is a Model Context Protocol (MCP) server for managing GitHub issues in Kanban board format and streamlining LLM task management. Versions 0.3.0 and 0.4.0 of the MCP Server are written in a way that is vulnerable to command injection vulnerability attacks as part of some of its MCP Server tool definition and implementation. The MCP Server exposes the tool `add_comment` which relies on Node.js child process API `exec` to execute the GitHub (`gh`) command, is an unsafe and vulnerable API if concatenated with untrusted user input.

Workaround: As of time of publication, no known patches are available.

But you can securely rewrite the vulnerable handleAddComment function using execFile or the GitHub REST API to avoid command injection risks.

Workaround 1: Using execFile (Safer Shell Execution)

execFile does not invoke a shell, so special characters in inputs (like ;, &&, etc.) are treated as literal arguments, not commands

Workaround 2: Using GitHub REST API via @octokit/rest

– No shell involved.

– Fully typed and authenticated.

– GitHub officially supports and maintains this SDK.

Official announcement: Please refer to url for details –

https://nvd.nist.gov/vuln/detail/CVE-2025-53818

AMD-based AI systems combining AMD rocBLAS and Intel MKL can become fast supercomputer in the world (14-07-2025)

Preface: Supercomputers rely on math libraries to efficiently handle the complex numerical computations required for scientific simulations and modeling. These libraries provide optimized routines for linear algebra, numerical analysis, and other mathematical operations, enabling supercomputers to perform these calculations much faster than with general-purpose code.

While math libraries are a crucial component, they are not the sole key to boosting overall AI performance on supercomputers. Supercomputers excel at AI due to their parallel processing capabilities, specialized hardware like GPUs and TPUs, and efficient memory management, not just the math libraries they use. Math libraries are essential for performing the calculations required by AI algorithms, but they rely on the underlying hardware architecture and software infrastructure of the supercomputer to deliver that performance.

Background: AMD rocBLAS 6.0.2 is a version of AMD’s library for Basic Linear Algebra Subprograms (BLAS) optimized for AMD GPUs within the ROCm platform. It provides high-performance, robust implementations of BLAS operations, similar to legacy BLAS but adapted for GPU execution using the HIP programming language. Specifically, version 6.0.2 is a point release that includes minor bug fixes to improve the stability of applications using AMD’s MI300 GPUs. It also introduces new driver features for system qualification on partner server offerings.

Using AMD rocBLAS and Intel MKL (2016 or later) together can be beneficial because MKL, while optimized for Intel CPUs, can sometimes perform suboptimally on AMD CPUs. rocBLAS, on the other hand, is specifically optimized for AMD GPUs and CPUs, providing a performance boost on AMD hardware.

Why Mix rocBLAS and MKL?

  • rocBLAS: Optimized for AMD GPUs (and CPUs via ROCm stack).
  • MKL: Optimized for Intel CPUs, but still useful for certain CPU-bound tasks.
  • Mixing: You can selectively use each library for the operations where it performs best.

– END-

Nvidia security focus – Rowhammer attack potential risk – July 2025 (11th July 2025)

Preface: The Rowhammer effect, a hardware vulnerability in DRAM chips, was first publicly presented and analyzed in June 2014 at the International Symposium on Computer Architecture (ISCA). This research, conducted by Yoongu Kim et al., demonstrated that repeatedly accessing a specific row in a DRAM chip can cause bit flips in nearby rows, potentially leading to security breaches.

Background: Nvidia has shifted from “copy on flip” to asynchronous copy mechanisms in their GPU architecture, particularly with the Ampere architecture and later. This change allows for more efficient handling of data transfers between memory and the GPU, reducing latency and improving overall performance, especially in scenarios with high frame rates or complex computations.

When System-Level ECC is enabled, it prevents attackers from successfully executing Rowhammer attacks by ensuring memory integrity. The memory controller detects and corrects bit flips, making it nearly impossible for an attacker to exploit them for privilege escalation or data corruption.

Technical details: Modern DRAMs, including the ones used by NVIDIA, are potentially susceptible to Rowhammer. The now decade-old Rowhammer problem has been well known for CPU memories (e.g., DDR, LPDDR). Recently, researchers at the University of Toronto demonstrated a successful Rowhammer exploitation on a NVIDIA A6000 GPU with GDDR6 memory where System-Level ECC was not enabled. In the same paper, the researchers showed that enabling System-Level ECC mitigates the Rowhammer problem. 

Official announcement: Technical details: see link – https://nvidia.custhelp.com/app/answers/detail/a_id/5671