Category Archives: AI and ML

CVE-2026-24237 and CVE-2026-24221: About NVIDIA NVTabular deserialization

(4th June 2026)

Preface: From a security engineering standpoint, there is no conceptual difference in the attack mechanism. Both the older vulnerabilities (CVE-2025-33214 / CVE-2025-33213) and the newer ones shown in the diagram share the exact same root weakness: Insecure Deserialization (CWE-502) via Python’s built-in pickle module.

Background:

NVIDIA Merlin & NVTabular (The Pipeline Base) –

NVIDIA Merlin is an end-to-end framework designed to accelerate deep learning recommender systems (RecSys). Within this ecosystem, NVTabular acts as the heavy-lifter for the ETL (Extract, Transform, Load) stage. It uses GPU-accelerated RAPIDS cuDF and Dask under the hood to handle multi-terabyte tabular datasets that exceed system CPU memory.

Integration with cuML and PyTorch –

To achieve maximum throughput, the pipeline passes these highly optimized, GPU-aligned data tensors directly into training frameworks (like PyTorch) or machine learning libraries (like cuML for clustering, classification, or collaborative filtering). The critical security boundary exists where these components save, transfer, or load their execution states across different nodes or microservices.

Vulnerability details: Both CVE-2026-24237 and CVE-2026-24221 are categorized under CWE-502: Deserialization of Untrusted Data.

  • Serialization is the process of converting an in-memory object (like an NVTabular transformer setup or a cuML model state) into a byte stream for storage or transmission.
  • Deserialization reverse-engineers that byte stream back into an active living object in memory.

Why Python’s pickle Module is Inherently Insecure?

The flaw stems from the pipeline’s reliance on Python’s native pickle module for saving and reloading model states or custom transformer pipelines.

pickle is not a safe serialization format because it does not just store raw data; it stores object reconstruction instructions. It utilizes a stack-based virtual machine (the Pickle VM) to execute these instructions sequentially when building the object back up.

Official announcement: Please refer to link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5851

In late March 2026, developers reverse-engineering Claude Code (Anthropic’s official CLI tool) discovered two major client-side cache bugs. Is it similar to the term “Quiet leak”? (2nd Jun 2026)

Preface: When a system has a design flaw without a assigned CVE identifier, standard signatures in a Web Application Firewall (WAF) will not detect or block the exploit.Why the WAF Fails?

No Signature: WAFs rely on signatures of known vulnerabilities (CVEs) to block attacks.

Valid Traffic: Exploits targeting design flaws use legitimate application features and look like normal user behavior.

Logic-Based: Design flaws are errors in how the application is built, not coding bugs.

Background: In late March 2026, developers reverse-engineered Claude Code (Anthropic’s official CLI tool) and discovered two critical client-side caching vulnerabilities, causing token consumption to surge by 10-20 times per interaction. However, no CVE numbers were released this time. Is this true? In late March 2026, members of the community reverse-engineered the Claude Code CLI tool and discovered significant client-side cache bugs that caused token consumption to increase by an estimated 10–20times per interaction.

This incident, which occurred around March 23–31, 2026, resulted in widespread reports of paid users exhausting their usage limits within minutes rather than hours, with some users seeing 5-hour session windows drain in under 70 minutes.

No Official CVE: While the bug was acknowledged by Anthropic as a “top priority” investigation on March 31, it was handled as a product bug rather than a security CVE, causing significant frustration among developers.

Vulnerability details: In late March 2026, developers reverse-engineering Claude Code (Anthropic’s official CLI tool) discovered two major client-side cache bugs that caused token consumption to explode by 10–20× per interaction.

Remedy: To explicitly safeguard your code against token-inflation regressions and guarantee a 90% cost reduction via prompt caching,  you must inject cache_control breakpoints directly into your tool array and message blocks. Please refer to diagram for details.

CVE-2026-24162 – About NVIDIA Merlin Transformers4Rec for Linux platform  (1st Jun 2026)

Preface: Data engineers perform seamless preprocessing, a foundational stage where they gather messy, raw data from diverse sources, clean it (handling missing values, outliers, inconsistencies), integrate disparate datasets, and transform it into a unified, structured format, making it ready and reliable for data scientists to perform advanced feature engineering (creating new, meaningful features) and ultimately build better machine learning models. This ensures a high-quality, consistent input, preventing “garbage in, garbage out” for the modeling phase.

Background: NVIDIA Merlin relies directly on RAPIDS cuDF to handle high-performance, GPU-accelerated dataframe operations for recommender systems. The specific ecosystem library used for this within Merlin is NVTabular. NVTabular and RAPIDS (cuDF/cuML) for preprocessing and feature engineering.

For example: interaction data in cuDF, feed it through a Merlin processing pipeline, and extract the resulting GPU data arrays to train a cuML machine learning model.

cuML is a suite of GPU-accelerated machine learning algorithms and mathematical primitives within the NVIDIA RAPIDS ecosystem, designed to act as a fast, drop-in replacement for Scikit-learn. It allows data scientists to achieve 10-50x faster training times on large datasets by leveraging GPU parallelism.

Where serialization risks actually happen in cuML?

An “improper deserialization of untrusted data” vulnerability (like those involving Python’s pickle module) only occurs if you later attempt to load a previously saved model or object from an unknown or unverified source.

To patch and avoid this vulnerability, NVIDIA and the broader ML ecosystem mandate moving away from arbitrary Python object pickling. Instead, systems should use:

•Safetensors: For saving native deep learning model weights safely (since it restricts execution entirely to pure tensor data and avoids code execution pathways).

•ONNX: For standardized, non-executable model formats

Vulnerability details: CVE-2026-24162 NVIDIA Transformers4Rec for Linux contains a vulnerability where an attacker could cause improper deserialization of untrusted data. A successful exploit of this vulnerability might lead to code execution, data tampering, and information disclosure.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2026-24162

CVE-2026-24212: NVIDIA Isaac Launchable contains a vulnerability (29th May 2026)

Preface: The primary purpose of Isaac Launchable is to provide a turn-key, web-browser-based cloud setup via NVIDIA Brev for developers who lack local hardware. Tesla operates its own multi-billion-dollar on-premise supercomputers (like the Tesla Dojo cluster and massive custom NVIDIA H100/H200 data centers). They do not need a standardized, plug-and-play browser template to rent individual cloud GPUs. Tesla utilizes NVIDIA Isaac Sim—a robotics simulation and synthetic data generation platform—for developing and training its AI-powered robots.

Background: The core design objective of the isaac-launchable project (commonly referred to as “Launchable”) is to democratize and simplify access to NVIDIA’s heavy-duty robotics simulation tools by removing local hardware barriers and complex installation configurations. In an Isaac Launchable cloud environment (running inside the NVIDIA Brev container ecosystem), control commands are sent to a robot within a script executed inside the cloud-hosted VS Code terminal. The command pipeline relies on Isaac Lab and the Omniverse Physics Engine (PhysX). The cloud python script computes the robot’s target state (e.g., target joint positions, velocities, or joint efforts) and writes them directly to the simulation’s articulation buffers.

Instead of fighting for “market share” against other companies, Isaac Launchable competes with traditional local setups.

•Traditional Method: Manual Docker and local container workflows (e.g., standard ROS 2 setups on native Linux machines).

•Launchable Method: Zero-friction cloud deployment. Its “market share” is growing rapidly among researchers, universities, and agile startups who do not have the capital to purchase dedicated $10,000+ RTX enterprise workstations but need immediate access to physics training environments.

Vulnerability details: According to the NVIDIA Security Advisory, CVE-2026-24212 is specifically classified as CWE-319 (Cleartext Transmission of Sensitive Information)within the NVIDIA Isaac Launchable component for Linux.

  • The vulnerable mechanism: The issue lies within the background communication channel or telemetry transit layer managed by the isaac-launchable utility itself. It transmits internal credentials, API keys, or security tokens in unencrypted plaintext over the network.

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5830

CVE-2026-24188: About NVIDIA TensorRT (26th May 2026)

Preface: TensorRT is NVIDIA’s general-purpose inference SDK that compiles and optimizes a wide variety of AI models (CNNs, computer vision, traditional neural networks) to run as fast as possible on NVIDIA GPUs.

TensorRT-LLM is a specialized, open-source library built on top of TensorRT specifically tailored to optimize and execute Large Language Models (LLMs).

Background: How the Diagram Corresponds to the Vulnerability?

The diagram maps out how improper memory management between the host (CPU) and device (GPU) exposes a system to this flaw:

  1. Static Buffer Allocation: Step #3 allocates a rigid GPU memory space using cuda.mem_alloc(input_data.nbytes). This sets up a buffer size based entirely on the initial shape of the input_data.
  2. Untrusted Runtime Input: As shown in text boxes 3 and 4, if a remote attacker sends a maliciously crafted input that modifies the shape or size at runtime, the application fails to recalculate the allocation bounds.
  3. Out-of-Bounds Copy: When Step #4 (cuda.memcpy_htod) executes, it forces the larger data stream into the pre-allocated smaller buffer. This overflows the boundary and writes data directly into adjacent GPU memory locations, causing a classic CWE-787 Out-of-bounds Write.

Remediations

  • Update the Software: NVIDIA released an advisory specifying that upgrading to TensorRT v10.16.1 or newer mitigates these risks.
  • Input Boundary Checks: Always strictly validate input dimensions before initiating data copies to device memory.
  • Leverage Native Profiles: If deploying models with varying input dimensions, use TensorRT’s built-in optimization profiles for dynamic shapes rather than manually overriding raw host-to-device pointers without size verification.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5836

CVE-2025-33255: About NVIDIA TensorRT-LLM (22nd May 2026)

Preface: DeepSpeed MII, an open-source Python library developed by Microsoft, aims to make powerful model inference accessible, emphasizing high throughput, low latency, and cost efficiency. TensorRT LLM, an open-source framework from NVIDIA, is designed for optimizing and deploying large language models on NVIDIA GPUs.

Background: TensorRT-LLM is a library developed by NVIDIA to optimize and run large language models (LLMs) efficiently on NVIDIA GPUs. It provides a Python API to define and manage these models, ensuring high performance during inference.

The Python Executor within TensorRT-LLM is a component that orchestrates the execution of inference tasks. It manages the scheduling and execution of requests, ensuring that the GPU resources are utilized efficiently. The Python Executor handles various tasks such as batching requests, managing model states, and coordinating with other components like the model engine and the scheduler.

MPI (Message Passing Interface) helps distribute workloads across multiple GPUs by allowing independent CPU processes to manage different GPUs and coordinate their operations. Because GPUs cannot communicate directly across network nodes, MPI coordinates the sending and receiving of data between nodes while utilizing hardware-accelerated paths to shift workloads off the CPU.

Vulnerability details: CVE-2025-33255 NVIDIA TensorRT-LLM for any platform contains a vulnerability in MPI server, where an attacker could cause an unsafe deserialization. A successful exploit of this vulnerability might lead to code execution, denial of service, data tampering, or information disclosure.

Note: To completely mitigate the risk shown in attached diagram, ensure your deployment workflow includes these two final rules:

  1. Isolate MPI Traffic: Set up your cluster so that the network fabric connecting Nodes 1–4 sits on a private, isolated VLAN or subnet with no external internet ingress.
  2. Upgrade the Image: Verify that your docker pull command grabs a TensorRT-LLM container image version released after the May 2026 security patch advisory.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5805

CVE-2026-24207: About NVIDIA Triton Inference Server (21st May 2026)

Preface: The NVIDIA Triton Inference Server natively supports gRPC as one of its primary communication protocols for the client API. Furthermore, gRPC can also be used for health checks, statistics, and model loading/unloading operations, not just inference requests. Inference requests arrive at the server via either HTTP/REST or GRPC or by the C API and are then routed to the appropriate per-model scheduler.

Background: NVIDIA’s security bulletin did not provide details. I speculate the cause of CVE-2026-24207 is as follows:

The Bypass Logic

A standard gRPC request path is canonical: /package.Service/Method. If an attacker crafts a raw HTTP/2 frame where the :path pseudo-header is package[.]Service/Method (missing the leading /), the following happens:

Step1 – Routing Success: The gRPC server sees the request and correctly identifies which handler to trigger, even without the leading slash.

Step2 – Match Failure: The authorization engine (like grpc/authz) checks the path against its rules. It looks for a literal match for /package[.]Service/Method. Since the incoming path is package[.]Service/Method, the Deny rule does not trigger.

Step3 – Fallback Triggered: Because the specific deny rule failed to match, the engine falls back to its next rule, which is typically a “catch-all” Allow rule.

My question is that gRPC has an authorization bypass vulnerability affecting all gRPC-Go (google[.]golang[.]org/grpc) versions prior to 1.79.3. However, Triton’s gRPC functionality is primarily implemented in src/grpc/grpc_server[.]cc. Can I say that the CVE-2026-24207 vulnerability occurs on the client side rather than the server side? Because for edge deployments, Triton Server is also provided as a shared library, and its API allows the full functionality of the server to be directly integrated into the application. What are your thoughts on this?

If you are using the standard Triton Inference Server binary (which is built in C++), it uses the C++ gRPC implementation, not the Go version. Therefore, it is not vulnerable to CVE-2026-24207 on the server side.

Vulnerability details: CVE-2026-24207 – NVIDIA Triton Inference Server contains a vulnerability where an attacker could cause an authentication bypass. A successful exploit of this vulnerability might lead to code execution, escalation of privileges, data tampering, denial of service, or information disclosure.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5828

CVE-2026-46300 (Fragnesia) is a Linux kernel privilege escalation in the XFRM ESP-in-TCP subsystem. Does it affect GX-grade supercomputers? (18th May 2026)

Preface: If BlueField DPU supports configuring IPsec rules using strongSwan 5.9.0bf, does it use kernel IPsec in ARM?

Yes, when using strongSwan 5.9.0bf on the BlueField DPU, it utilizes the Linux kernel IPsec stack (xfrm) running on the ARM cores to manage and configure security associations, which can then be offloaded to the hardware acceleration engines.

Background: The only scenario where a GPU or advanced SoC interacts with the Linux kernel’s XFRM subsystem is during IPsec Network Offloading (SmartNICs / DPUs).

If an enterprise SoC or Data Processing Unit (like an NVIDIA BlueField DPU) handles high-speed network traffic, the Linux XFRM subsystem can act as a control plane. It passes the encryption policies (SAs and SPIs) down to the chip’s network engine so that standard internet IPsec traffic can be encrypted at wire speed directly on the network interface card (NIC) hardware rather than taxing the main host CPU.

Vulnerability details: Fragnesia is a Linux local privilege escalation vulnerability that is a member of the Dirty Frag vulnerability class.

Are there any remedies available for CVE-2026-46300?

Patch Your Kernel:

Update your Linux kernel immediately. Patches were released by major distributions (AlmaLinux, Ubuntu, Red Hat, Debian, Amazon Linux) around May 14-16, 2026.

Apply Temporary Mitigation (If Patching is Delayed): Disable the vulnerable modules (esp4, esp6, and rxrpc) to block the exploit.Run: sudo rmmod esp4 esp6 rxrpcCreate blacklist file: echo -e “install esp4 /bin/false\ninstall esp6 /bin/false\ninstall rxrpc /bin/false” | sudo tee /etc/modprobe[.]d/fragnesia[.]conf

Clear Page Cache: If you suspect a machine was targeted before patching, run sync; echo 3 | sudo tee /proc/sys/vm/drop_caches to evict potentially corrupted cached pages.

Official announcement: Please refer to the link for details – https://github.com/v12-security/pocs/tree/main/fragnesia

A more imaginative assumption on TDXRay: Microarchitectural Side-Channel Analysis of Intel TDX for Real-World Workloads (15th MAY 2026)

Preface: In these scenarios (see attached diagram), microarchitecture side-channel attacks targeting Intel TDX can directly impact and jeopardize the security of AMD accelerators.

Even though the AMD Instinct APU operates on a completely different silicon package, the two architectures are fundamentally tied together by a shared software stack, device driver interface, and physical interconnect fabric.

The specific risks regarding how TDXRay and cross-domain side-channel leakage bypass the hardware boundary in your diagram are detailed below:

Technical details:

1. Host-Side Driver Leakage (The Primary Target)

As illustrated in attached diagram, the ROCm Driver and HIP Runtime execute inside the Intel TDX Virtual Machine / Trust Domain.

•When primitives like those found in the TDXRay research paper (e.g., page-level or cache-line tracking) are utilized by an untrusted host hypervisor, they target the Intel CPU’s caches and memory controller.

•Because the Intel CPU must actively prepare, schedule, and feed data arrays (h_a, h_b) to the AMD accelerator, the memory access patterns of the ROCm driver itself are leaked.

•An attacker can infer exactly when the AMD kernel is being launched, what memory addresses are being mapped, and the size or stride of the datasets being transferred.

2. Interconnect Fabric Bottlenecks & Shared Cache Timing

The highlighted section in your diagram notes that memcpy can leak info via cache and memory controller interaction.

•During hipMemcpyHostToDevice or hipMemcpyDeviceToHost, data travels across the PCIe Gen 5 / CXL Interconnect Fabric.

•If a malicious actor on the host hypervisor induces resource contention on the shared Intel CPU core or memory bus, they can observe subtle latency shifts.

•By monitoring the timing delays of the Intel CPU waiting for the AMD APU to complete its tasks (hipDeviceSynchronize), the attacker can infer secret-dependent execution paths inside the AMD hardware without ever probing the AMD chip directly.

3. The Cross-Domain Threat Model (AMD SEV-SNP Parallel)

According to AMD’s Official Security Bulletin (AMD-SB-3044) published regarding the TDXRay findings, these types of microarchitectural host-side tracing methodologies fall within a category of behaviors that affect both Intel TDX and AMD SEV-SNP.

If an application leaks data structure layouts through its memory access patterns on the Intel host, the fact that the actual matrix operations happen on an AMD chip does not protect the workflow’s overall confidentiality.

Official announcement: Please refer to the link for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-3044.html

CVE-2026-43284: Dirty Frag tricks the IPsec/TCP stack into doing the “dirty work”(13th May 2026)

Preface: The “Dirty Frag” attack chains two separate flaws in the Linux kernel’s networking stack: one in the ESP(Encapsulating Security Payload) protocol used by IPsec and another in the RxRPC protocol used for the AFS distributed file system. If you do not use IPsec, disabling its modules removes one of the major attack paths.

Background: The “Dirty Frag” vulnerability is deemed difficult to patch immediately due to its exploitation of a long-standing core Linux kernel optimization, which initially lacked official, widespread patches upon disclosure. While disabling ESP modules helps, effective mitigation requires blacklisting both ESP and RxRPC modules, or patching the kernel directly.

How to mitigate vulnerabilities:

Step 1:Block the ESP and RxRPC modules: Create a configuration file (e.g., /etc/modprobe.d/dirtyfrag.conf) to ensure the modules cannot be auto-loaded by an exploit:

bash

install esp4 /bin/false
install esp6 /bin/false
install rxrpc /bin/false

Step 2:Unload current modules: Remove the modules if they are currently active in memory:

bash

sudo modprobe -r esp4 esp6 rxrpc
 

Step 3:Clear the Page Cache: The exploit works by corrupting the page cache. After applying the blocks, clear the cache to ensure no malicious changes persist in RAM:

bash

sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
 

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2026-43284