CVE-2025-23263: About NVIDIA DOCA-Host and Mellanox OFED (17th July 2025)

Preface: Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) is a software stack developed by NVIDIA (formerly Mellanox) that provides a tested and packaged version of the OpenFabrics Enterprise Distribution (OFED) for Mellanox network adapters. It enables high-performance networking capabilities, including RDMA and kernel bypass, for InfiniBand and Ethernet (RoCE) technologies.

Background: NVIDIA introduced DOCA-OFED in the DOCA-Host package. DOCA-Host is a unified package for host servers that includes all the basic components of DOCA and MLNX_OFED. MLNX_OFED is a single Virtual Protocol Interconnect (VPI) software stack that operates across all NVIDIA network adapter solutions.

Nvidia has also developed the Computing Unified Device Architecture (CUDA) and Data Center Infrastructure Single-Chip Architecture (DOCA) for CPU and GPU, CPU and DPU.

During the GTC conference in the fall of 2020, the SmartNIC technology they acquired after acquiring network equipment manufacturer Mellanox was officially unveiled under the name of DPU. Mellanox’s BlueField product line is considered a DPU (Data Processing Unit) because it’s designed to offload and accelerate data-centric tasks, such as networking, storage, and security, from the CPU. Essentially, DPUs like BlueField act as a specialized co-processor, handling tasks that would otherwise consume valuable CPU resources, improving overall system performance and efficiency. NVIDIA BlueField DPUs (Data Processing Units) are designed as System on a Chip (SoC) devices.

Vulnerability details: NVIDIA DOCA-Host and Mellanox OFED contain a vulnerability in the VGT+ feature, where an attacker on a VM might cause escalation of privileges and denial of service on the VLAN.

Official announcement: Please refer to url for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5654

CVE-2025-23266 and CVE-2025-23266: NVIDIA Container Toolkit design weakness (16-07-2025)

Preface: Docker Compose is a tool that makes it easier to define and manage multi-container Docker applications. It simplifies running interconnected services, such as a frontend, backend API, and database, by allowing them to be launched and controlled together.

Docker Compose is a utility for defining and running multi-container Docker applications. Furthermore, Docker Compose responsible manages the container lifecycle. Container lifecycle management is a critical process of overseeing the creation, deployment, and operation of a container until its eventual decommissioning.

Background: Docker Compose v2.30.0 has introduced lifecycle hooks, making it easier to manage actions tied to container start and stop events. This feature lets developers handle key tasks more flexibly while keeping applications clean and secure.

Vulnerability details:

CVE-2025-23266: NVIDIA Container Toolkit for all platforms contains a vulnerability in some hooks used to initialize the container, where an attacker could execute arbitrary code with elevated permissions. A successful exploit of this vulnerability might lead to escalation of privileges, data tampering, information disclosure, and denial of service.

CVE-2025-23267: NVIDIA Container Toolkit for all platforms contains a vulnerability in the update-ldcache hook, where an attacker could cause a link following by using a specially crafted container image. A successful exploit of this vulnerability might lead to data tampering and denial of service.

Official announcement: Please refer to url for details

https://nvidia.custhelp.com/app/answers/detail/a_id/5659

Ref: Does Disabling Hooks Disable Container Lifecycle Management?

Hooks – In this context, hooks are scripts or binaries that run during container lifecycle events (e.g., prestart, poststart). The CUDA compatibility hook injects libraries or environment variables needed for CUDA apps.

Disabling the Hook – Prevents the automatic injection of CUDA compatibility libraries into containers. This does not disable the entire container lifecycle, but it removes one automation step in the lifecycle.

CVE-2025-53818: Command Injection in MCP Server github-kanban-mcp-server (15th July 2025)

Preface: Does it good when artificial Intelligence use Open Source software? Yes, using open-source software is generally considered a positive aspect for artificial intelligence development. It fosters collaboration, transparency, and faster innovation, while also potentially reducing costs and biases. However, it’s crucial to acknowledge potential risks like misuse and the need for responsible development practices.

Background: The Model Context Protocol (MCP) is an open standard, open-source framework designed to standardize how AI models, particularly large language models (LLMs), interact with external tools, systems, and data sources. Think of it as a universal adapter, similar to USB-C, for AI applications, allowing them to easily connect to and utilize various data and tools.

A Kanban MCP Server is a server component that manages Kanban boards using the Model Context Protocol (MCP). It allows AI assistants and other systems to interact with and manipulate Kanban boards programmatically, enabling automation and integration of workflows.

Vulnerability details: GitHub Kanban MCP Server is a Model Context Protocol (MCP) server for managing GitHub issues in Kanban board format and streamlining LLM task management. Versions 0.3.0 and 0.4.0 of the MCP Server are written in a way that is vulnerable to command injection vulnerability attacks as part of some of its MCP Server tool definition and implementation. The MCP Server exposes the tool `add_comment` which relies on Node.js child process API `exec` to execute the GitHub (`gh`) command, is an unsafe and vulnerable API if concatenated with untrusted user input.

Workaround: As of time of publication, no known patches are available.

But you can securely rewrite the vulnerable handleAddComment function using execFile or the GitHub REST API to avoid command injection risks.

Workaround 1: Using execFile (Safer Shell Execution)

execFile does not invoke a shell, so special characters in inputs (like ;, &&, etc.) are treated as literal arguments, not commands

Workaround 2: Using GitHub REST API via @octokit/rest

– No shell involved.

– Fully typed and authenticated.

– GitHub officially supports and maintains this SDK.

Official announcement: Please refer to url for details –

https://nvd.nist.gov/vuln/detail/CVE-2025-53818

AMD-based AI systems combining AMD rocBLAS and Intel MKL can become fast supercomputer in the world (14-07-2025)

Preface: Supercomputers rely on math libraries to efficiently handle the complex numerical computations required for scientific simulations and modeling. These libraries provide optimized routines for linear algebra, numerical analysis, and other mathematical operations, enabling supercomputers to perform these calculations much faster than with general-purpose code.

While math libraries are a crucial component, they are not the sole key to boosting overall AI performance on supercomputers. Supercomputers excel at AI due to their parallel processing capabilities, specialized hardware like GPUs and TPUs, and efficient memory management, not just the math libraries they use. Math libraries are essential for performing the calculations required by AI algorithms, but they rely on the underlying hardware architecture and software infrastructure of the supercomputer to deliver that performance.

Background: AMD rocBLAS 6.0.2 is a version of AMD’s library for Basic Linear Algebra Subprograms (BLAS) optimized for AMD GPUs within the ROCm platform. It provides high-performance, robust implementations of BLAS operations, similar to legacy BLAS but adapted for GPU execution using the HIP programming language. Specifically, version 6.0.2 is a point release that includes minor bug fixes to improve the stability of applications using AMD’s MI300 GPUs. It also introduces new driver features for system qualification on partner server offerings.

Using AMD rocBLAS and Intel MKL (2016 or later) together can be beneficial because MKL, while optimized for Intel CPUs, can sometimes perform suboptimally on AMD CPUs. rocBLAS, on the other hand, is specifically optimized for AMD GPUs and CPUs, providing a performance boost on AMD hardware.

Why Mix rocBLAS and MKL?

  • rocBLAS: Optimized for AMD GPUs (and CPUs via ROCm stack).
  • MKL: Optimized for Intel CPUs, but still useful for certain CPU-bound tasks.
  • Mixing: You can selectively use each library for the operations where it performs best.

– END-

CVE-2025-30403: A heap-buffer-overflow vulnerability is possible in mvfst via a specially crafted message during a QUIC session. (13th Jul 2025)

Preface: mvfst (Pronounced move fast) is a client and server implementation of IETF QUIC protocol in C++ by Facebook. QUIC is a UDP based reliable, multiplexed transport protocol that will become an internet standard.

Background: QUIC (Quick UDP Internet Connections), was designed with the primary goal of enhancing the speed and reliability of internet connections, particularly for latency-sensitive and bandwidth-intensive applications. It aims to reduce connection setup time, improve data transfer speeds, and enhance security compared to traditional TCP and TLS protocols.

The QUIC protocol is a key component in modern CDN (Content Delivery Network) strategies, particularly with the rise of HTTP/3. QUIC, developed by Google and standardized by the IETF, is a transport layer protocol that offers significant performance and security improvements over traditional TCP, especially in the context of CDNs.

Vulnerability details: A heap-buffer-overflow vulnerability is possible in mvfst via a specially crafted message during a QUIC session. This issue affects mvfst versions prior to v2025.07.07.00.

Does removing maxBatchSize affect performance?

Yes, potentially.

To offset any performance degradation from removing maxBatchSize, CDNs may:

-Optimize packet scheduling and batching elsewhere in the QUIC stack to maintain throughput.

-Use adaptive batching: Dynamically adjust how many packets are processed based on system load and traffic patterns.

-Deploy hardware acceleration: Offload QUIC processing to specialized hardware (e.g., SmartNICs or FPGAs).

-Leverage edge caching: Reduce the need for frequent QUIC connections by serving more content directly from edge nodes.

Official announcement: Please refer to the url  for details – https://nvd.nist.gov/vuln/detail/CVE-2025-30403

Nvidia security focus – Rowhammer attack potential risk – July 2025 (11th July 2025)

Preface: The Rowhammer effect, a hardware vulnerability in DRAM chips, was first publicly presented and analyzed in June 2014 at the International Symposium on Computer Architecture (ISCA). This research, conducted by Yoongu Kim et al., demonstrated that repeatedly accessing a specific row in a DRAM chip can cause bit flips in nearby rows, potentially leading to security breaches.

Background: Nvidia has shifted from “copy on flip” to asynchronous copy mechanisms in their GPU architecture, particularly with the Ampere architecture and later. This change allows for more efficient handling of data transfers between memory and the GPU, reducing latency and improving overall performance, especially in scenarios with high frame rates or complex computations.

When System-Level ECC is enabled, it prevents attackers from successfully executing Rowhammer attacks by ensuring memory integrity. The memory controller detects and corrects bit flips, making it nearly impossible for an attacker to exploit them for privilege escalation or data corruption.

Technical details: Modern DRAMs, including the ones used by NVIDIA, are potentially susceptible to Rowhammer. The now decade-old Rowhammer problem has been well known for CPU memories (e.g., DDR, LPDDR). Recently, researchers at the University of Toronto demonstrated a successful Rowhammer exploitation on a NVIDIA A6000 GPU with GDDR6 memory where System-Level ECC was not enabled. In the same paper, the researchers showed that enabling System-Level ECC mitigates the Rowhammer problem. 

Official announcement: Technical details: see link – https://nvidia.custhelp.com/app/answers/detail/a_id/5671

CVE-2025-32462 – Local Privilege Escalation via chroot option (10th July 2025)

Preface: Using LDAP to manage sudoers rules is becoming a more common practice, particularly in larger organizations. It offers several advantages over traditional methods of storing sudoers in a local file, including simplified management, improved scalability, and enhanced security.

Background:

Best Practices for Using sudo.

  • Avoid Logging in as Root: Use sudo instead of su to minimize security risks.
  • Grant Minimal Permissions: Assign only the necessary privileges to prevent unauthorized access.
  • Monitor sudo Usage: Check logs for suspicious activity

This helps to minimize security risks associated with elevated privileges.

* Specific commands: Instead of ALL=(ALL:ALL), grant access to specific commands only. For example, jane ALL=(ALL:ALL) /usr/bin/apt update, /usr/bin/apt upgrade

Vulnerability details: Sudo before 1.9.17p1, when used with a sudoers file that specifies a host that is neither the current host nor ALL, allows listed users to execute commands on unintended machines.

This vulnerability occurs when a sudoers file specifies a host that is neither the current host nor ALL. In such cases, sudo may incorrectly allow listed users to execute commands on unintended machines.

This is a configuration-based logic flaw rather than a memory corruption or privilege escalation bug. It does not involve CHROOT directly, but rather the host-specific rule matching in sudoers.

Official announcement: Please see the link for details – https://nvd.nist.gov/vuln/detail/CVE-2025-32462

AMD releases details about Transient Scheduler Attack (TSA) – 9 Jul 2025

Preface: CPU transient instructions refer to instructions that are speculatively executed by a processor’s out-of-order execution engine, but which may ultimately be discarded and not reflected in the processor’s architectural state. These instructions are executed based on predictions about control flow or data dependencies, and if the prediction is incorrect, the results of these transient instructions are discarded.

Background: Transient Scheduler Attacks (TSA) are new speculative side channel attacks related to the execution timing of instructions under specific microarchitectural conditions. In some cases, an attacker may be able to use this timing information to infer data from other contexts, resulting in information leakage.

Vulnerability details:

CVE-2024-36350 – A transient execution vulnerability in some AMD processors may allow an attacker to infer data from previous stores, potentially resulting in the leakage of privileged information.

CVE-2024-36357 – A transient execution vulnerability in some AMD processors may allow an attacker to infer data in the L1D cache, potentially resulting in the leakage of sensitive information across privileged boundaries.

CVE-2024-36348 – A transient execution vulnerability in some AMD processors may allow a user process to infer the control registers speculatively even if UMIP[3] feature is enabled, potentially resulting in information leakage.

CVE-2024-36349 – A transient execution vulnerability in some AMD processors may allow a user process to infer TSC_AUX even when such a read is disabled, potentially resulting in information leakage.

Official announcement: Please see the link for details –

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7029.html

CVE-2025-21432: Double Free in SPS-HLOS (8th July 2025)

Preface: Concise Binary Object Representation (CBOR) is a binary data serialization format loosely based on JSON authored by Carsten Bormann. The use of Concise Binary Object Representation (CBOR) in SPS HLOS (and other constrained environments) is primarily due to its ability to provide a compact, efficient, and extensible binary data format. This makes it suitable for resource-constrained devices and networks, where bandwidth and processing power are limited. 

Background: In Adreno GPUs, SPS (Shader Processors) HLOS refers to a specific architecture or organization within the GPU where Shader Processors are grouped and managed. “HLOS” likely stands for “High Level Operating System”, indicating that these SPS are managed by the system’s main operating system (like Android) rather than being directly controlled by the GPU’s internal firmware. This means the CPU and operating system handle the overall workload and scheduling for these shader processors.

Ref: CBOR data, from the perspective of a Transfer Agent (TA), refers to data formatted using the Concise Binary Object Representation (CBOR) standard, likely used for efficient and compact representation of information related to financial transactions or other assets managed by the TA. 

Vulnerability details: Memory corruption while retrieving the CBOR data from TA (Transfer Agent).

Summary:

Component – Qualcomm Adreno GPU (Graphics Driver)

Vulnerability Type – Double Free / Memory Corruption

Trigger – Occurs during CBOR data retrieval from shader memory by the Transfer Agent (TA).

Affected Subsystem: SPS-HLOS (Shader Processor System managed by High-Level OS)

Impact:

Double free condition in shared memory buffers.

Potential for arbitrary code execution or privilege escalation.

Exploitable via crafted GPU workloads or malicious apps using OpenCL/Vulkan.

Official announcement: Please see the link for details – https://docs.qualcomm.com/product/publicresources/securitybulletin/july-2025-bulletin.html

CVE-2025-21450: Improper Authentication in GPS GNSS (7th July 2025)

Preface:

GNSS – This is a global term encompassing all satellite constellations that provide positioning, navigation, and timing (PNT) services. Besides GPS, other GNSS include GLONASS (Russia), Galileo (EU), and BeiDou (China).

GPS – The Global Positioning System, developed by the US Department of Defense, is the most widely recognized and used GNSS. It was the first global satellite navigation system and has become a household term.

Background: A GPS/GNSS receiver can be considered the client in a similar way to an IoT device or smartphone, particularly when used for location-based services. GPS/GNSS receivers require cryptographic downloads, specifically key material and potentially software updates, to enable authentication and anti-spoofing features. These features ensure the integrity and authenticity of the received signals, protecting against malicious attacks like spoofing where fake signals mimic legitimate satellites.

Ref: The GPS module in the Snapdragon 8 Gen 3 is integrated within the Snapdragon X75 5G Modem-RF System. The X75 is a comprehensive modem-RF solution that includes not only 5G capabilities but also other wireless technologies like Wi-Fi, Bluetooth, and location services like GPS. This integration allows for efficient and high-performance location tracking and navigation on devices powered by the Snapdragon 8 Gen 3.

Vulnerability details: Cryptographic issue occurs due to use of insecure connection method while downloading.

Vulnerability Type: CWE-287 Improper Authentication

Official announcement: Please see the link for details –

https://docs.qualcomm.com/product/publicresources/securitybulletin/july-2025-bulletin.html