Category Archives: Potential Risk of CVE

In late March 2026, developers reverse-engineering Claude Code (Anthropic’s official CLI tool) discovered two major client-side cache bugs. Is it similar to the term “Quiet leak”? (2nd Jun 2026)

Preface: When a system has a design flaw without a assigned CVE identifier, standard signatures in a Web Application Firewall (WAF) will not detect or block the exploit.Why the WAF Fails?

No Signature: WAFs rely on signatures of known vulnerabilities (CVEs) to block attacks.

Valid Traffic: Exploits targeting design flaws use legitimate application features and look like normal user behavior.

Logic-Based: Design flaws are errors in how the application is built, not coding bugs.

Background: In late March 2026, developers reverse-engineered Claude Code (Anthropic’s official CLI tool) and discovered two critical client-side caching vulnerabilities, causing token consumption to surge by 10-20 times per interaction. However, no CVE numbers were released this time. Is this true? In late March 2026, members of the community reverse-engineered the Claude Code CLI tool and discovered significant client-side cache bugs that caused token consumption to increase by an estimated 10–20times per interaction.

This incident, which occurred around March 23–31, 2026, resulted in widespread reports of paid users exhausting their usage limits within minutes rather than hours, with some users seeing 5-hour session windows drain in under 70 minutes.

No Official CVE: While the bug was acknowledged by Anthropic as a “top priority” investigation on March 31, it was handled as a product bug rather than a security CVE, causing significant frustration among developers.

Vulnerability details: In late March 2026, developers reverse-engineering Claude Code (Anthropic’s official CLI tool) discovered two major client-side cache bugs that caused token consumption to explode by 10–20× per interaction.

Remedy: To explicitly safeguard your code against token-inflation regressions and guarantee a 90% cost reduction via prompt caching,  you must inject cache_control breakpoints directly into your tool array and message blocks. Please refer to diagram for details.

CVE-2026-24162 – About NVIDIA Merlin Transformers4Rec for Linux platform  (1st Jun 2026)

Preface: Data engineers perform seamless preprocessing, a foundational stage where they gather messy, raw data from diverse sources, clean it (handling missing values, outliers, inconsistencies), integrate disparate datasets, and transform it into a unified, structured format, making it ready and reliable for data scientists to perform advanced feature engineering (creating new, meaningful features) and ultimately build better machine learning models. This ensures a high-quality, consistent input, preventing “garbage in, garbage out” for the modeling phase.

Background: NVIDIA Merlin relies directly on RAPIDS cuDF to handle high-performance, GPU-accelerated dataframe operations for recommender systems. The specific ecosystem library used for this within Merlin is NVTabular. NVTabular and RAPIDS (cuDF/cuML) for preprocessing and feature engineering.

For example: interaction data in cuDF, feed it through a Merlin processing pipeline, and extract the resulting GPU data arrays to train a cuML machine learning model.

cuML is a suite of GPU-accelerated machine learning algorithms and mathematical primitives within the NVIDIA RAPIDS ecosystem, designed to act as a fast, drop-in replacement for Scikit-learn. It allows data scientists to achieve 10-50x faster training times on large datasets by leveraging GPU parallelism.

Where serialization risks actually happen in cuML?

An “improper deserialization of untrusted data” vulnerability (like those involving Python’s pickle module) only occurs if you later attempt to load a previously saved model or object from an unknown or unverified source.

To patch and avoid this vulnerability, NVIDIA and the broader ML ecosystem mandate moving away from arbitrary Python object pickling. Instead, systems should use:

•Safetensors: For saving native deep learning model weights safely (since it restricts execution entirely to pure tensor data and avoids code execution pathways).

•ONNX: For standardized, non-executable model formats

Vulnerability details: CVE-2026-24162 NVIDIA Transformers4Rec for Linux contains a vulnerability where an attacker could cause improper deserialization of untrusted data. A successful exploit of this vulnerability might lead to code execution, data tampering, and information disclosure.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2026-24162

CVE-2026-24212: NVIDIA Isaac Launchable contains a vulnerability (29th May 2026)

Preface: The primary purpose of Isaac Launchable is to provide a turn-key, web-browser-based cloud setup via NVIDIA Brev for developers who lack local hardware. Tesla operates its own multi-billion-dollar on-premise supercomputers (like the Tesla Dojo cluster and massive custom NVIDIA H100/H200 data centers). They do not need a standardized, plug-and-play browser template to rent individual cloud GPUs. Tesla utilizes NVIDIA Isaac Sim—a robotics simulation and synthetic data generation platform—for developing and training its AI-powered robots.

Background: The core design objective of the isaac-launchable project (commonly referred to as “Launchable”) is to democratize and simplify access to NVIDIA’s heavy-duty robotics simulation tools by removing local hardware barriers and complex installation configurations. In an Isaac Launchable cloud environment (running inside the NVIDIA Brev container ecosystem), control commands are sent to a robot within a script executed inside the cloud-hosted VS Code terminal. The command pipeline relies on Isaac Lab and the Omniverse Physics Engine (PhysX). The cloud python script computes the robot’s target state (e.g., target joint positions, velocities, or joint efforts) and writes them directly to the simulation’s articulation buffers.

Instead of fighting for “market share” against other companies, Isaac Launchable competes with traditional local setups.

•Traditional Method: Manual Docker and local container workflows (e.g., standard ROS 2 setups on native Linux machines).

•Launchable Method: Zero-friction cloud deployment. Its “market share” is growing rapidly among researchers, universities, and agile startups who do not have the capital to purchase dedicated $10,000+ RTX enterprise workstations but need immediate access to physics training environments.

Vulnerability details: According to the NVIDIA Security Advisory, CVE-2026-24212 is specifically classified as CWE-319 (Cleartext Transmission of Sensitive Information)within the NVIDIA Isaac Launchable component for Linux.

  • The vulnerable mechanism: The issue lies within the background communication channel or telemetry transit layer managed by the isaac-launchable utility itself. It transmits internal credentials, API keys, or security tokens in unencrypted plaintext over the network.

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5830

CVE-2025-29951: AMD R2000, R1000, and Athlon 3000 series staying alert! (28-05-2026)

Preface: You can find Ryzen inside:

  • Industrial IoT Gateways: Factory machines that handle massive amounts of real-time data.
  • Digital Signage & Kiosks: Large public screens and interactive maps in malls or airports.
  • Smart Medical Devices: High-end medical imaging and hospital machines.
  • Automotive AI: Modern digital car cockpits and self-driving machine systems.

AMD Ryzen Embedded R2000 Series Processors are highly capable, power-efficient System-on-Chips (SoCs) frequently leveraged in autonomous driving, mobile robotics, and ADAS (Advanced Driver Assistance Systems). They function primarily as the central compute brains for vehicle sensor data processing and digital cockpit controls.

Background: Normally, a chip doesn’t need to be desoldered to be updated. An administrator (or an attacker) can use a tool like flashrom inside Linux to talk directly to the motherboard’s built-in SPI controller to read or write to the BIOS chip.

Under normal conditions, hardware security rules called System Management Mode (SMM) ROM protections lock down the SPI controller. Even if you have root access in Linux, the hardware will block flashrom from rewriting critical, protected areas of the BIOS.

CVE-2022-23829 is the exact flaw that breaks this safety net:

  • It allows an attacker who already has Ring 0 (kernel-mode / root) access in Linux to bypass that hardware lock.
  • Because of this bypass, tools like flashrom or a custom driver can write untrusted or malicious data directly onto the soldered Flash SPI ROM chip.

Once the attacker uses flashrom method to place the malicious data on the chip, the chain reaction on the left side of your image begins:

1.             The Flash SPI ROM Memory Chip now holds the malicious data.

2.             The AMD Secure Processor (ASP) boots up early and automatically reads this data.

3.             Because of a missing size check (insufficient bounds check), the malicious data overflows the processor’s tiny 256-byte buffer, corrupting the memory.

4.             By the time the Main Host x86 Cores wake up to run the standard boot sequence, the system has already been compromised.

Vulnerability details: The Root Cause of CVE-2025-29951 – Official security analysis from AMD Security Bulletin SB-4013 confirms that CVE-2025-29951 lives inside the early AMD Secure Processor (ASP) bootloader.

When the system boots up, the ASP parses external configuration tables and firmware parameters passed from the SPI flash chip. The bootloader copies an input block into a fixed-size local stack variable but fails to perform a boundary length check. An attacker with local access can pass a malicious, oversized table that spills out of the stack variable, allowing them to hijack the execution flow and escalate system privileges.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2025-29951

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-4013.html

CVE-2025-61972: The vulnerability resides in the NBIO subsystem of affected AMD processors. (27th May 2026)

Preface: Because this is a hardware-level configuration deficiency, software-level barriers inside the OS kernel cannot fully prevent it. The definitive mitigation requires applying AGESA firmware/microcode updates provided by your motherboard OEM or cloud vendor (Supermicro, Google Cloud, etc.) to correctly enforce register sealing at the hardware layer before platform boot transitions control to the hypervisor.

Background: Please refer to the illustration; point 3 of the illustration emphasizes the use of a Type 1 bare-metal hypervisor to eliminate host operating system overhead for high-throughput workloads such as large-scale video streaming. While a Type-1 hypervisor maximizes efficiency, it actually amplifies the blast radius of this flaw: if an attacker manages to compromise that highly-privileged bare-metal hypervisor layer, the lack of hardware lock bits grants them unhindered access to issue the writel() commands depicted in your code, compromising every independent tenant stream residing on that physical node.

Root Cause & Code Analysis Verification (Block 4 & 5)

  • Unprotected MMIO Routing: Your diagram accurately captures the essence of CWE-1233 (Security-Sensitive Hardware Controls with Missing Lock Bit Protection). In normal operating states, the Northbridge I/O (NBIO) registers that gate access to the System Management Network (SMN) must be permanently locked following BIOS/platform initialization.
  • The Index/Data Side Path: Your C code correctly models how a compromised hypervisor module or a local attacker with Ring-0 privileges uses an MMIO window (0xB8 for index, 0xBC for data) to execute arbitrary reads and writes across the internal SMN fabric. Because lock bits are missing or un-enforced, the host operating system retains full hardware manipulation rights post-boot.

Vulnerability details: CVE-2025-61972 Missing lock bit protection for NBIO registers could allow a local admin-privileged attacker to gain arbitrary System Management Network (SMN) access, potentially resulting in arbitrary code execution in AMD Secure Processor (ASP) and loss of the SEV-SNP guest’s confidentiality and integrity.

Official announcement: Please refer to the link for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-3030.html

CVE-2026-24188: About NVIDIA TensorRT (26th May 2026)

Preface: TensorRT is NVIDIA’s general-purpose inference SDK that compiles and optimizes a wide variety of AI models (CNNs, computer vision, traditional neural networks) to run as fast as possible on NVIDIA GPUs.

TensorRT-LLM is a specialized, open-source library built on top of TensorRT specifically tailored to optimize and execute Large Language Models (LLMs).

Background: How the Diagram Corresponds to the Vulnerability?

The diagram maps out how improper memory management between the host (CPU) and device (GPU) exposes a system to this flaw:

  1. Static Buffer Allocation: Step #3 allocates a rigid GPU memory space using cuda.mem_alloc(input_data.nbytes). This sets up a buffer size based entirely on the initial shape of the input_data.
  2. Untrusted Runtime Input: As shown in text boxes 3 and 4, if a remote attacker sends a maliciously crafted input that modifies the shape or size at runtime, the application fails to recalculate the allocation bounds.
  3. Out-of-Bounds Copy: When Step #4 (cuda.memcpy_htod) executes, it forces the larger data stream into the pre-allocated smaller buffer. This overflows the boundary and writes data directly into adjacent GPU memory locations, causing a classic CWE-787 Out-of-bounds Write.

Remediations

  • Update the Software: NVIDIA released an advisory specifying that upgrading to TensorRT v10.16.1 or newer mitigates these risks.
  • Input Boundary Checks: Always strictly validate input dimensions before initiating data copies to device memory.
  • Leverage Native Profiles: If deploying models with varying input dimensions, use TensorRT’s built-in optimization profiles for dynamic shapes rather than manually overriding raw host-to-device pointers without size verification.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5836

CVE-2026-28972: A true kernel dangling pointer or out-of-bounds write typically arises from logic flaws within the kernel’s own resource management subsystems. (25th May 2026)

Preface: In iOS, the microkernel component (Mach) does not communicate with user space from a separate address space. Instead, it communicates directly within a unified kernel space alongside monolithic components, bypassing the traditional performance costs of a pure microkernel.

While iOS runs on an ARM-based architecture (Apple Silicon), its operating system core, XNU (“X is Not Unix”), is a hybrid kernel. It integrates the Mach microkernel with a monolithic BSD layer and the I/O Kitdriver framework into a single, highly privileged address space. 

Background: If a specific, complex code path inside a kernel subsystem utilizes an object but contains a logic error that forgets to call the appropriate reference increment function (e.g., ipc_port_reference()), the reference count drops to zero prematurely when another thread requests a deletion.

The Result: The kernel safely deletes the object according to its counters, but the flawed subsystem still holds a raw C pointer to that memory address. When the subsystem eventually attempts to write data to that pointer, it performs an out-of-bounds write into memory that may now contain entirely different data.

Vulnerability details: An out-of-bounds write issue was addressed with improved input validation. This issue is fixed in iOS 18.7.9 and iPadOS 18.7.9, iOS 26.5 and iPadOS 26.5, macOS Sequoia 15.7.7, macOS Sonoma 14.8.7, macOS Tahoe 26.5, tvOS 26.5, visionOS 26.5, watchOS 26.5. An app may be able to cause unexpected system termination or write kernel memory.

Official announcement: Please refer to the link for details – https://www.cve.org/CVERecord?id=CVE-2026-28972

CVE-2025-33255: About NVIDIA TensorRT-LLM (22nd May 2026)

Preface: DeepSpeed MII, an open-source Python library developed by Microsoft, aims to make powerful model inference accessible, emphasizing high throughput, low latency, and cost efficiency. TensorRT LLM, an open-source framework from NVIDIA, is designed for optimizing and deploying large language models on NVIDIA GPUs.

Background: TensorRT-LLM is a library developed by NVIDIA to optimize and run large language models (LLMs) efficiently on NVIDIA GPUs. It provides a Python API to define and manage these models, ensuring high performance during inference.

The Python Executor within TensorRT-LLM is a component that orchestrates the execution of inference tasks. It manages the scheduling and execution of requests, ensuring that the GPU resources are utilized efficiently. The Python Executor handles various tasks such as batching requests, managing model states, and coordinating with other components like the model engine and the scheduler.

MPI (Message Passing Interface) helps distribute workloads across multiple GPUs by allowing independent CPU processes to manage different GPUs and coordinate their operations. Because GPUs cannot communicate directly across network nodes, MPI coordinates the sending and receiving of data between nodes while utilizing hardware-accelerated paths to shift workloads off the CPU.

Vulnerability details: CVE-2025-33255 NVIDIA TensorRT-LLM for any platform contains a vulnerability in MPI server, where an attacker could cause an unsafe deserialization. A successful exploit of this vulnerability might lead to code execution, denial of service, data tampering, or information disclosure.

Note: To completely mitigate the risk shown in attached diagram, ensure your deployment workflow includes these two final rules:

  1. Isolate MPI Traffic: Set up your cluster so that the network fabric connecting Nodes 1–4 sits on a private, isolated VLAN or subnet with no external internet ingress.
  2. Upgrade the Image: Verify that your docker pull command grabs a TensorRT-LLM container image version released after the May 2026 security patch advisory.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5805

CVE-2026-24207: About NVIDIA Triton Inference Server (21st May 2026)

Preface: The NVIDIA Triton Inference Server natively supports gRPC as one of its primary communication protocols for the client API. Furthermore, gRPC can also be used for health checks, statistics, and model loading/unloading operations, not just inference requests. Inference requests arrive at the server via either HTTP/REST or GRPC or by the C API and are then routed to the appropriate per-model scheduler.

Background: NVIDIA’s security bulletin did not provide details. I speculate the cause of CVE-2026-24207 is as follows:

The Bypass Logic

A standard gRPC request path is canonical: /package.Service/Method. If an attacker crafts a raw HTTP/2 frame where the :path pseudo-header is package[.]Service/Method (missing the leading /), the following happens:

Step1 – Routing Success: The gRPC server sees the request and correctly identifies which handler to trigger, even without the leading slash.

Step2 – Match Failure: The authorization engine (like grpc/authz) checks the path against its rules. It looks for a literal match for /package[.]Service/Method. Since the incoming path is package[.]Service/Method, the Deny rule does not trigger.

Step3 – Fallback Triggered: Because the specific deny rule failed to match, the engine falls back to its next rule, which is typically a “catch-all” Allow rule.

My question is that gRPC has an authorization bypass vulnerability affecting all gRPC-Go (google[.]golang[.]org/grpc) versions prior to 1.79.3. However, Triton’s gRPC functionality is primarily implemented in src/grpc/grpc_server[.]cc. Can I say that the CVE-2026-24207 vulnerability occurs on the client side rather than the server side? Because for edge deployments, Triton Server is also provided as a shared library, and its API allows the full functionality of the server to be directly integrated into the application. What are your thoughts on this?

If you are using the standard Triton Inference Server binary (which is built in C++), it uses the C++ gRPC implementation, not the Go version. Therefore, it is not vulnerable to CVE-2026-24207 on the server side.

Vulnerability details: CVE-2026-24207 – NVIDIA Triton Inference Server contains a vulnerability where an attacker could cause an authentication bypass. A successful exploit of this vulnerability might lead to code execution, escalation of privileges, data tampering, denial of service, or information disclosure.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5828

CVE-2026-8836: A vulnerability was found in lwIP up to 2.2.1. (20th May 2026)

Preface: IoT manufacturers are very willing to use lwIP (Lightweight IP) in firmware, and it is widely used in commercial IoT products. It is a dominant TCP/IP stack in the embedded space because it provides a full-featured networking stack (TCP, UDP, DHCP, DNS) while being highly optimized for resource-constrained, low-power devices.

Even though firmware allocate an lwIP pbuf to hold the payload in RAM

[// PBUF_TRANSPORT automatically reserves space for UDP/IP headers]

If your firmware explicitly uses SNMPv3 alongside your Wake-on-LAN feature, you must apply the patch.

Background: Inbound parsing and outbound allocation are two completely different memory directions (see below):

Outbound – When you call pbuf_alloc(PBUF_TRANSPORT, …), you are allocating memory for an outgoing packet. This works perfectly and securely for transmitting your Magic Packet.

Inbound – When an SNMPv3 management command comes into your device, lwIP allocates an incoming pbuf automatically to hold the raw network packet. The vulnerability happens after allocation, during the parsing phase inside snmp_msg[.]c.

Why CVE-2026-8836 Bypasses Pbuf Protection

The flaw is a stack-based buffer overflow, not a pbuf heap overflow.

i.When a remote user sends an SNMPv3 packet, the function snmp_parse_inbound_frame sets up a fixed-size array on the CPU stack called request->msg_authentication_parameters. This buffer is hardcoded to a maximum size of SNMP_V3_MAX_AUTH_PARAM_LENGTH (usually 32 bytes).

ii.The unpatched code uses the variable tlv.value_len (which comes directly from the untrusted incoming packet header) to decide how many bytes to decode into that stack array.

iii.An attacker can craft a malicious SNMPv3 packet stating that the authentication data is 100 bytes long. Because the check was commented out (/* IF_PARSE_ASSERT(…) */), lwIP blindly executes snmp_asn1_dec_raw and writes all 100 bytes into the 32-byte stack buffer, smashing the CPU stack, corrupting the return address, and crashing your chip or allowing remote code execution.

Vulnerability details: A vulnerability was found in lwIP up to 2.2.1. Affected is the function snmp_parse_inbound_frame of the file src/apps/snmp/snmp_msg.c of the component snmpv3 USM Handler. Performing a manipulation of the argument msgAuthenticationParameters results in stack-based buffer overflow. The attack may be initiated remotely.

Remedy: The patch is named 0c957ec03054eb6c8205e9c9d1d05d90ada3898c. It is suggested to install a patch to address this issue.

Official announcement: Please refer to link for details –

https://nvd.nist.gov/vuln/detail/CVE-2026-8836