Category Archives: AI and ML

CVE-2025-37992: About NULL pointer dereference in net_sched (27-05-2025)

Preface: Linux powers large parts of the Internet, cloud infrastructure, and supercomputers. But it is difficult to determine the exact number of Linux systems in the world. This appears to be a technology trend that includes AI system infrastructure.

Background: In Linux, a “qdisc” stands for queueing discipline. It’s a core component of the Linux traffic control system, responsible for managing and scheduling network traffic on a per-interface basis. Essentially, a qdisc determines how the kernel handles packets before sending them to the network adapter.

Vulnerability details: Previously, when reducing a qdisc’s limit via the ->change() operation, only the main skb queue was trimmed, potentially leaving packets in the gso_skb list. This could result in NULL pointer dereference when we only check sch->limit against sch->q[.]qlen.

Remedy: This patch introduces a new helper, qdisc_dequeue_internal(), which ensures both the gso_skb list and the main queue are properly flushed when trimming excess packets. All relevant qdiscs (codel, fq, fq_codel, fq_pie, hhf, pie) are updated to use this helper in their ->change() routines.

Official announcement: Please see the link for details –

https://nvd.nist.gov/vuln/detail/CVE-2025-37992

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fe88c7e4fc2c1cd75a278a15ffbf1689efad4e76

When artificial intelligence encounters a geomagnetic storm (26-05-2025)

Preface: About fifteen years ago, extreme climate sounded the alarm for humanity. But we haven’t woken up yet. As we enter 2025, extreme weather is raging. Are we awake now?

Background: A coronal mass ejection (CME) can induce a geomagnetic storm when it interacts with Earth’s magnetosphere. CMEs are large clouds of plasma and magnetic fields ejected from the Sun, and when they hit Earth, they can disrupt the Earth’s magnetosphere, leading to temporary disturbances and geomagnetic storms.

Geomagnetic storms create geomagnetically induced currents (GICs). Geomagnetic storms, which are disturbances of Earth’s magnetic field caused by space weather events like coronal mass ejections (CMEs), induce rapid changes in the magnetic field. These changes, in turn, create electric fields that drive GICs to flow through conductive paths on Earth’s surface, such as power grids, pipelines, and other infrastructure.

How much electricity does a supercomputer used for artificial intelligence consume?

A single modern AI GPU can consume up to 700 watts of power. A typical supercomputer, especially those used for AI training, can consume significantly more power, with some examples exceeding 4 megawatts (4,000,000 watts). This high power consumption is largely due to the large number of powerful GPUs and other specialized hardware needed for these complex computations.

How does artificial intelligence think about this problem?

Building a supercomputer facility underground with its own dedicated power supply does significantly reduce—but not entirely eliminate—the risks from geomagnetic storms. Here’s a breakdown:

1.Underground Location:

-Provides natural shielding from fluctuating magnetic fields.

-Reduces exposure to induced currents in long conductors.

2.Dedicated Power Supply:

-If it’s isolated from the main power grid (e.g., using local generators, batteries, or renewables), it avoids GICs that typically enter through long transmission lines.

Shorter internal cabling means less potential for induced voltages.

3.Shielded Infrastructure:

-If the facility uses shielded transformers, GIC-blocking devices, and grounding systems, it can further mitigate risks.

End.

CVE-2025-47436: Heap-based Buffer Overflow vulnerability in Apache ORC. (15-5-2025)

Preface: Traditional row-based formats for data normalization have several limitations:

Complex Queries: Normalization often requires joining multiple tables to retrieve data, which can make queries more complex and slower.

Maintenance Challenges: Maintaining a highly normalized database can be more difficult, as changes to the schema may require updates to multiple tables.

Background: Apache ORC (Optimized Row Columnar) is a free and open-source, column-oriented data storage format designed for use in Hadoop and other big data processing systems. It was created to address the limitations of traditional row-based formats, providing a more efficient way to store and process large datasets. ORC is widely used by data processing frameworks like Apache Spark, Apache Hive, Apache Flink, and Apache Hadoop.

Vulnerability details: Heap-based Buffer Overflow vulnerability in Apache ORC. A vulnerability has been identified in the ORC C++ LZO decompression logic, where specially crafted malformed ORC files can cause the decompressor to allocate a 250-byte buffer but then attempts to copy 295 bytes into it. It causes memory corruption.

Remedy: This issue affects Apache ORC C++ library: through 1.8.8, from 1.9.0 through 1.9.5, from 2.0.0 through 2.0.4, from 2.1.0 through 2.1.1. Users are recommended to upgrade to version 1.8.9, 1.9.6, 2.0.5, and 2.1.2, which fix the issue.

Official announcement: Please see the link for details –

https://nvd.nist.gov/vuln/detail/CVE-2025-47436

Privilege Desynchronization: Cross-Privilege Spectre Attacks with Branch Privilege Injection – Part 2 (14-05-2025)

Preface: Before reading the detailed information, it is recommended to read Part 1 first.

Privilege Desynchronization: Cross-Privilege Spectre Attacks with Branch Privilege Injection (Part 1)  –

http://www.antihackingonline.com/under-our-observation/privilege-desynchronization-cross-privilege-spectre-attacks-with-branch-privilege-injection-14-05-2025/

Technical details: It is to ensure the serialization of memory access. The internal operation is to add some delays in a series of memory accesses to ensure that the memory access after this instruction occurs after the memory access before this instruction is completed (no overlap occurs).

Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction. Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE completes.

AMD’s AutoIBRS (Automatic Indirect Branch Restricted Speculation) is designed to mitigate timing-based attacks, such as Spectre. AutoIBRS helps avoid the performance overhead associated with LFENCE by automatically restricting speculative execution of indirect branches. This mechanism reduces the need for frequent LFENCE instructions, thereby minimizing delays while still protecting against timing vulnerabilities.

Cyber security focus provided by ETH Zurich: Researchers from ETH Zurich have provided AMD with a paper titled “Privilege Desynchronization: Cross-Privilege Spectre Attacks with Branch Privilege Injection.”
AMD reviewed the paper and believes that this vulnerability does not impact AMD CPUs. 

If supported by the processor, operating systems enable eIBRS or AutoIBRS to mitigate cross-privilege BTI attacks. These mitigations need to keep track of the privilege domain of branch instructions to work correctly, which is non-trivial due to the highly complex and asynchronous nature of branch prediction. For example, previous work has shown that branch predictions are updated before branches retire, and in certain cases even before they are decoded. Our first challenge revolves around analyzing the behavior of restricted branch prediction under race conditions.

Official announcement: Researchers from ETH Zurich have provided AMD with a paper titled “Privilege Desynchronization: Cross-Privilege Spectre Attacks with Branch Privilege Injection.”
AMD reviewed the paper and believes that this vulnerability does not impact AMD CPUs. 

Please see the link for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7030.html

Privilege Desynchronization: Cross-Privilege Spectre Attacks with Branch Privilege Injection (14-05-2025)

Preface: Enhanced IBRS (eIBRS) and Automatic IBRS (AutoIBRS) are features designed to mitigate the Spectre V2 vulnerability, which affects speculative execution in CPUs.

Background: AutoIBRS is a similar feature introduced by AMD in their Zen 4 processors. It automatically manages IBRS mitigation resources across privilege level transitions, offering better performance compared to Retpoline. This feature is particularly beneficial for AMD’s Ryzen 7000 and EPYC 9004 series processors.

AMD EPYC 9004 series processors are designed for data centers and high-performance computing (HPC) environments. They offer features like up to 96 “Zen 4” cores, 12 channels of DDR5 memory, and PCIe Gen5 support.

Cyber security focus provided by ETH Zurich: Researchers from ETH Zurich have provided AMD with a paper titled “Privilege Desynchronization: Cross-Privilege Spectre Attacks with Branch Privilege Injection.”
AMD reviewed the paper and believes that this vulnerability does not impact AMD CPUs. 

If supported by the processor, operating systems enable eIBRS or AutoIBRS to mitigate cross-privilege BTI attacks. These mitigations need to keep track of the privilege domain of branch instructions to work correctly, which is non-trivial due to the highly complex and asynchronous nature of branch prediction. For example, previous work has shown that branch predictions are updated before branches retire, and in certain cases even before they are decoded. Our first challenge revolves around analyzing the behavior of restricted branch prediction under race conditions.

Official announcement: Researchers from ETH Zurich have provided AMD with a paper titled “Privilege Desynchronization: Cross-Privilege Spectre Attacks with Branch Privilege Injection.”
AMD reviewed the paper and believes that this vulnerability does not impact AMD CPUs. 

Please see the link for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7030.html

CVE-2025-37834: About Linux vmscan[.]c (8th May 2025)

Preface: All systems based on the Linux kernel utilize the vmscan[.]c file for memory management. This file is integral to the kernel’s memory reclamation process, ensuring efficient use of system memory across various Linux distributions.

Background: The vmscan[.]c file in the Linux kernel is responsible for managing memory reclamation. It contains functions that help the system reclaim memory by scanning and freeing up pages that are no longer in use. This process is crucial for maintaining system performance and preventing memory shortages.

Some key functions within vmscan.c include:

kswapd: A kernel thread that periodically scans and frees up memory pages.

shrink_node: This function attempts to reclaim memory from a specific node.

shrink_zone: It works on reclaiming memory from a specific zone within a node.

These functions work together to ensure that the system has enough free memory to operate efficiently.

Vulnerability details: mm/vmscan: don’t try to reclaim hwpoison folio. The vulnerability has been resolved.

The enhancement in the vmscan[.]c file, specifically the handling of hardware-poisoned pages, is indeed part of the broader memory management improvements. This enhancement is not limited to the shrink_node function alone. It applies to various parts of the memory reclamation process, including functions like shrink_zone and shrink_folio_list.

Official announcement: Please see the link for details – https://nvd.nist.gov/vuln/detail/CVE-2025-37834

CVE-2025-23245: NVIDIA TensorRT-LLM for any platform contains a vulnerability in python executor (30-4-2025)

Preface: DeepSpeed MII, an open-source Python library developed by Microsoft, aims to make powerful model inference accessible, emphasizing high throughput, low latency, and cost efficiency. TensorRT LLM, an open-source framework from NVIDIA, is designed for optimizing and deploying large language models on NVIDIA GPUs.

Background: TensorRT-LLM is a library developed by NVIDIA to optimize and run large language models (LLMs) efficiently on NVIDIA GPUs. It provides a Python API to define and manage these models, ensuring high performance during inference.

The Python Executor within TensorRT-LLM is a component that orchestrates the execution of inference tasks. It manages the scheduling and execution of requests, ensuring that the GPU resources are utilized efficiently. The Python Executor handles various tasks such as batching requests, managing model states, and coordinating with other components like the model engine and the scheduler.

Vulnerability details: NVIDIA TensorRT-LLM for any platform contains a vulnerability in python executor where an attacker may cause a data validation issue by local access to the TRTLLM server. A successful exploit of this vulnerability may lead to code execution, information disclosure and data tampering.

CWE-502: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5648

CVE‑2025‑23245 and CVE-2025-23246: About NVIDIA vGPU software Driver (24-04-2025)

Preface: To virtualize a single NVIDIA GPU into multiple virtual GPUs and allocate them to different virtual machines or users, you can use NVIDIA’s vGPU capability.

Background: Unified memory is disabled by default. If used, you must enable unified memory individually for each vGPU that requires it by setting a vGPU plugin parameter. NVIDIA CUDA Toolkit profilers are supported and can be enabled on a VM for which unified memory is enabled.

Enabling Unified Memory for Nvidia vGPU does indeed allow a guest virtual machine (VM) to access global resources. When Unified Memory is enabled, it allows the VM to dynamically share memory with the host and other VMs, providing more flexibility and potentially improving performance for certain workloads.

Enabling access to global resources through Unified Memory in Nvidia vGPU can potentially lead to denial of service (DoS) attacks due to several reasons:

  • When multiple VMs share the same physical GPU resources, there’s a risk of resource contention. If one VM consumes excessive resources, it can starve other VMs, leading to degraded performance or even service outages.
  • Allowing VMs to access global resources increases the attack surface. Malicious actors could exploit vulnerabilities to disrupt services or gain unauthorized access to sensitive data.

Vulnerability details:

CVE-2025-23246: NVIDIA vGPU software for Windows and Linux contains a vulnerability in the Virtual GPU Manager (vGPU plugin), where it allows a guest to consume uncontrolled resources. A successful exploit of this vulnerability might lead to denial of service.

CWE-732: Incorrect Permission Assignment for Critical

CVE-2025-23245: NVIDIA vGPU software for Windows and Linux contains a vulnerability in the Virtual GPU Manager (vGPU plugin), where it allows a guest to access global resources. A successful exploit of this vulnerability might lead to denial of service.

CWE-400: Uncontrolled Resource Consumption

Official announcement: Please see the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5630

CVE‑2025‑23244: About NVIDIA GPU Display Driver (24-04-2025)

Preface: The NVIDIA Tesla R570 driver is used for various data center GPUs, including the NVIDIA A100 and NVIDIA V100. These GPUs are designed for high-performance computing, AI, and deep learning applications.

Background:

The CUDA software environment consists of three parts:

  • CUDA Toolkit (libraries, runtime and tools) – User-mode SDK used to build CUDA applications
  • CUDA driver – User-mode driver component used to run CUDA applications (for example, libcuda.so on Linux systems)
  • NVIDIA GPU device driver – Kernel-mode driver component for NVIDIA GPUs

On Linux systems, the CUDA driver and kernel mode components are delivered together in the NVIDIA display driver package.

DxgkDdiEscape is a function used in Windows drivers, specifically within the DirectX graphics kernel subsystem. In Linux, a similar function to DxgkDdiEscape is ioctl (Input/Output Control).

The ioctl system call can indeed be a potential vector forIncorrect Authorization vulnerabilities if not implemented correctly.

Vulnerability details: NVIDIA GPU Display Driver for Linux contains a vulnerability which could allow an unprivileged attacker to escalate permissions. A successful exploit of this vulnerability might lead to code execution, denial of service, escalation of privileges, information disclosure, and data tampering.

Impact: Code execution, denial of service, escalation of privileges, information disclosure, and data tampering

Official announcement: Please see the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5630

CVE-2025-23253: NVIDIA NvContainer service for Windows contains a vulnerability (24-4-2025)

Preface: The most common way is Attackers place a malicious DLL in a directory that is checked before the legitimate system paths.

Because the application loading the DLL is trusted, security solutions may not flag the execution as suspicious.

Cybercriminals often use several common program instructions when creating malicious DLLs. For example, dll injection, Registry Manipulation,…etc.

Evasion Techniques:

Obfuscation: Code within the DLL is often obfuscated to avoid detection by security tools.

Steganography: Hiding malicious code within seemingly benign files.

Background: The NVIDIA NvContainer service is part of the NVIDIA graphics driver package and is responsible for various tasks, including telemetry data gathering, overlay management, and high-performance GPU scheduling. It doesn’t imply that Windows OS runs on a container runtime like Docker or Kubernetes. Instead, it refers to the way NVIDIA organizes and manages its services and processes within the driver package.

The term “container” in this context is more about how NVIDIA encapsulates its services to ensure they run efficiently and independently, rather than using a full-fledged containerization technology

Vulnerability details: NVIDIA NvContainer service for Windows contains a vulnerability in its usage of OpenSSL, where an attacker could exploit a hard-coded constant issue by copying a malicious DLL in a hard-coded path. A successful exploit of this vulnerability might lead to code execution, denial of service, escalation of privileges, information disclosure, or data tampering.

Official announcement: Please see the official link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5644