Category Archives: AI and ML

CVE‑2025‑23249, CVE-2025-23250 & CVE-2025-23251: NVIDIA Nemo Framework contains vulnerabilities (23rd Apr 2025)

Preface: The symbol ~/. by itself is not a relative path traversal; it simply refers to the home directory of the current user. However, when combined with ./.., it can be part of a relative path traversal.

Relative path traversal involves using sequences like ../ to navigate up the directory hierarchy. For example, ~/. refers to the home directory, and ./.. moves up one directory level from the current directory. So, ~/. ./.. would navigate to the parent directory of the home directory, which can be considered a form of relative path traversal

Background: NVIDIA NeMo is an end-to-end platform designed for developing and deploying generative AI models. This includes large language models (LLMs), vision language models (VLMs), video models, and speech AI. NeMo offers tools for data curation, fine-tuning, retrieval-augmented generation (RAG), and inference, making it a comprehensive solution for creating enterprise-ready AI models. Here are some key capabilities of NeMo LLMs:

  1. Customization: NeMo allows you to fine-tune pre-trained models to suit specific enterprise needs. This includes adding domain-specific knowledge and skills, and continuously improving the model with reinforcement learning from human feedback (RLHF).
  2. Scalability: NeMo supports large-scale training and deployment across various environments, including cloud, data centers, and edge devices. This ensures high performance and flexibility for different use cases.
  3. Foundation Models: NeMo offers a range of pre-trained foundation models, such as GPT-8, GPT-43, and GPT-530, which can be used for tasks like text classification, summarization, creative writing, and chatbots.
  4. Data Curation: The platform includes tools for processing and curating large datasets, which helps improve the accuracy and relevance of the models.
  5. Integration: NeMo can be integrated with other NVIDIA AI tools and services, providing a comprehensive ecosystem for AI development.

Vulnerability details:

CVE-2025-23249: NVIDIA NeMo Framework contains a vulnerability where a user could cause a deserialization of untrusted data by remote code execution. A successful exploit of this vulnerability might lead to code execution and data tampering.

CVE-2025-23250: NVIDIA NeMo Framework contains a vulnerability where an attacker could cause an improper limitation of a pathname to a restricted directory by an arbitrary file write. A successful exploit of this vulnerability might lead to code execution and data tampering.

CVE-2025-23251: NVIDIA NeMo Framework contains a vulnerability where a user could cause an improper control of generation of code by remote code execution. A successful exploit of this vulnerability might lead to code execution and data tampering.

Official announcement: Please see the official link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5641

About AXI Protocol Checker IP (22-04-2025)

When light weight AI become your partner. In the office, all people skill become equal. As a result, the inherent kindness in human nature will be hidden!

Preface: High Performance Computing (HPC) systems using AMD chips can utilize AXI crossbars. The AXI crossbar is used to route AXI4-Lite requests to corresponding sub-cores based on the address. This is particularly useful in complex SoC designs where efficient data routing and high throughput are essential.

However, it’s worth noting that AMD’s Versal adaptive SoCs feature a programmable Network-on-Chip (NoC), which replaces traditional AXI interconnects in the programmable logic. This NoC can achieve higher levels of design efficiency and performance compared to traditional AXI interconnects.

Background:

AXI Crossbar

  • In an AXI Crossbar, the master interfaces are the sources of transactions, and the slave interfaces are the destinations.
  • The crossbar routes transactions from multiple masters to multiple slaves based on address decoding and arbitration logic.
  • It ensures efficient communication and data transfer within a System-on-Chip (SoC) design.

AXI4-Lite and the Orchestrator serve distinct roles within an AXI Crossbar:

AXI4-Lite: AXI4-Lite is a simplified subset of the AXI4 protocol designed for low-complexity, low-throughput applications. It supports:

  • 32-bit address and data widths.
  • Single data transfer per transaction, making it ideal for control register access and configuration tasks.

The Orchestrator in an AXI Crossbar manages the routing and arbitration of transactions between multiple masters and slaves.

Vulnerability details: Researchers from ETH Zurich, UC San Diego, and RPTU Kaiserslautern-Landau shared a paper with AMD titled “EXPECT: On the Security Implications of Violations in AXI Implementations” and “XRAY Detecting and Exploiting Vulnerabilities in ARM AXI Interconnects” which explore methods for exposing vulnerabilities related to the AXI interface when utilizing the AMD AXI Crossbar IP in Vivado™ designs. The AXI Protocol Checker IP was included in the design as a debug check but failed to catch all protocol violations in the design.

Official announcement: Please see the link for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-8005.html

AI network congestion resembles ischemic stroke in humans (21-4-2025)

Preface: In ischemic stroke, every second counts. If TPA thrombolytic agent is used promptly in ischemic stroke, it can dissolve blood clots and reduce brain cell necrosis. But it must be used within three hours, so it is very important to grasp the golden three hours.

HPC systems do indeed function as a collective unit, similar to a single brain, network congestion remains a significant concern due to several technical reasons.  For instance: High Data Transfer Rates, Complex Communication Patterns, Shared Resources and Latency Sensitivity.

Background: HPC systems do indeed function as a collective unit, similar to a single brain, network congestion remains a significant concern due to several technical reasons:

-High Data Transfer Rates: HPC systems often involve massive data transfers between nodes. When multiple nodes simultaneously send and receive large amounts of data, it can overwhelm the network, leading to congestion.

-Complex Communication Patterns: HPC workloads typically involve complex communication patterns, such as all-to-all communication, which can create bottlenecks. Even if the network is designed to handle high traffic, certain patterns can still cause congestion2.

-Shared Resources: HPC systems share network resources among many nodes. When demand for these resources exceeds capacity, it results in congestion. This can delay data transfer and impact overall system performance.

-Latency Sensitivity: Many HPC applications are sensitive to latency. Network congestion increases latency, which can significantly affect the performance of time-critical applications.

-Scalability Challenges: As HPC systems scale up, the complexity and volume of data traffic increase. Ensuring efficient communication across thousands or even millions of nodes becomes challenging, and congestion can arise if the network infrastructure isn’t robust enough.

Solution: Addressing network congestion involves implementing advanced technologies like adaptive routing, congestion control mechanisms, and scalable interconnects.

System Management Mode (SMM) does not follow best practices. The impact extends beyond the desktop to HPC as well! (7th Apr 2025)

Preface: In the realm of High Performance Computing (HPC), processors that use the x86 architecture typically support System Management Mode (SMM). This includes:

-Intel Xeon Processors: Widely used in HPC systems, Intel Xeon processors support SMM for managing system-wide tasks such as power management and hardware control.

-AMD EPYC Processors: AMD EPYC processors, including the latest generations, also support SMM. These processors are known for their high core counts and robust performance in HPC environments.

Both Intel and AMD continue to leverage SMM in their x86-based processors to ensure efficient and secure system management.

Background: SMM operates transparently to the operating system and applications, allowing it to perform these tasks without interfering with the normal operation of the system.

Under HPC architecture, a cluster of computers essentially operates as a single entity, called a node, that can accept tasks and computations as a collective.

The isolation is particularly beneficial in HPC environments where uninterrupted performance is crucial.

Technical  details: System Management Mode (SMM) uses System Management RAM (SMRAM) to store and manage tasks. SMM is triggered through a System Management Interrupt (SMI), a signal sent from the chipset to the CPU. During platform initialization, the firmware configures the chipset to cause a System Management Interrupt for various events that the firmware developer would like the firmware to be made aware of.

  1. SwSmiHandler: This is the function that will handle the SMI.
  2. RegisterSmiHandler: This function registers the SMI handler with the SMM SW Dispatch protocol.
  3. UefiMain: This is the entry point of the UEFI application, which calls the registration function.

The key steps are locating the SMM SW Dispatch protocol, setting up the context for the SMI handler, and registering the handler.

Reference: Design flaw in SMM published by AMD on Feb 2025. Please refer to the link for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-4008.html

About AMD Ryzen™ AI Software: CVE-2025-0014, CVE-2024-36337,CVE-2024-36336 & CVE-2024-36328  (3th Apr 2025)

Preface: The Ryzen 7000 desktop and laptop chips were introduced in 2023. Alongside the main x86 CPU, Ryzen 7000 has a new type of coprocessor, a Neural Processing Unit (NPU), based on the XDNA™ AI Engine architecture. This new NPU is called Ryzen AI.

Background:

1.Install NPU Drivers

2.Download the NPU driver installation package NPU Driver

3.Install the NPU drivers by following these steps:

4.Extract the downloaded “NPU_RAI1.2.zip” zip file.

5.Open a terminal in administrator mode and execute the [[.]\npu_sw_installer[.]exe] exe file.

6.Ensure that NPU MCDM driver (Version:32.0.201.204, Date:7/26/2024) is correctly installed by opening Device Manager -> Neural processors -> NPU Compute Accelerator Device.

Vulnerability details:

CVE-2025-0014: Incorrect default permissions on the AMD Ryzen™ AI installation folder could allow an attacker to achieve privilege escalation, potentially resulting in arbitrary code execution.

CVE-2024-36337: nteger overflow within AMD NPU Driver could allow a local attacker to write out of bounds, potentially leading to loss of confidentiality, integrity or availability.

CVE-2024-36328: nteger overflow within AMD NPU Driver could allow a local attacker to write out of bounds, potentially leading to loss of integrity or availability.

CVE-2024-36336: nteger overflow within the AMD NPU Driver could allow a local attacker to write out of bounds, potentially leading to a loss of confidentiality, integrity, or availability.

Official announcement: Please refer to the official announcement for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7037.html

CVE-2025-2953: Floating point exception in torch[.]mkldnn_max_pool2d (31st Mar 2025)

Preface: The torch[.]nn[.]MaxPool2d function in PyTorch is used to apply a 2D max pooling operation over an input signal, which is typically an image or a batch of images.

  • Torch[.]mkldnn_max_pool2d is optimized for Intel’s MKL-DNN (Math Kernel Library for Deep Neural Networks). It leverages specific optimizations for Intel CPUs, which can lead to better performance on those processors. It might have limitations in terms of supported features and is more specialized for performance optimization.
  • Torch[.]nn[.]MaxPool2d is a more general implementation that works across different hardware platforms without specific optimizations for Intel CPUs.  It provides more flexibility and is easier to use within the PyTorch ecosystem, supporting various features like padding, dilation, and return indices.

Background: A floating point exception crash when using torch[.]mkldnn_max_pool2d can occur due to several reasons, often related to invalid or extreme values for parameters like kernel size, stride, or padding. Here are some common causes:

  1. Invalid Kernel Size: If the kernel size is set to an extremely large value or zero, it can lead to division by zero or other invalid operations, causing a floating point exception.
  2. Stride and Padding Issues: Similar to kernel size, setting stride or padding to extreme values can result in invalid calculations. For example, a stride of zero can cause the pooling operation to repeatedly access the same elements, leading to a crash.
  3. Input Tensor Dimensions: If the dimensions of the input tensor are not compatible with the specified kernel size, stride, or padding, it can lead to invalid memory access or calculations.

Vulnerability details: A vulnerability, which was classified as problematic, has been found in PyTorch 2.6.0+cu124. Affected by this issue is the function torch[.]mkldnn_max_pool2d. The manipulation leads to denial of service. An attack has to be approached locally. The exploit has been disclosed to the public and may be used.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2025-2953

Similar to previously disclosed side-channel attacks. Manufacturer (AMD) response to researcher (30-03-2025)

Preface: On 24th Oct, 2024, Researchers from Azure® Research, Microsoft® have provided to AMD a paper titled “Principled Microarchitectural Isolation on Cloud CPUs.” In their paper, the researchers describe a potential side-channel vulnerability on AMD CPUs. AMD believes that existing mitigation recommendations for prime and probe side-channel attacks remain applicable to the presented vulnerability.

Background: A two-bit saturating up-down counter is a type of counter used in computer architecture, particularly in branch prediction mechanisms. Here’s a brief overview:

  • Two-bit: The counter uses two bits, allowing it to represent four states (00, 01, 10, 11).
  • Up-down: The counter can increment (count up) or decrement (count down) based on the input signal.
  • Saturating: The counter does not wrap around when it reaches its maximum (11) or minimum (00) value. Instead, it stays at these values if further increments or decrements are attempted.
How It Works:
  1. States: The counter has four states: 00, 01, 10, and 11.
  2. Incrementing: If the counter is at 11 and receives an increment signal, it remains at 11. Similarly, if it is at 00 and receives a decrement signal, it stays at 00.
  3. Usage: These counters are often used in branch prediction to keep track of the history of branch outcomes and make predictions based on this history.

Ref: The pattern history table (PHT) branch architecture is an example of an architecture using two-bit saturating up-down counters. It contains a table of two-bit counters used to predict the direction for conditional branches.

About Branch History Leak:

Researchers from The Harbin Institute of Technology have shared with AMD a paper titled “Branch History LeakeR: Leveraging Branch History to Construct a New Side Channel-Theory and Practice” that demonstrates a side channel attack using the Global History Register (GHR).  The GHR is used to assist in conditional branch prediction. The researchers note that the GHR is shared between different security domains and may retain data after a security domain switch.  After a return to the user-space, the researchers were able to infer the direction of recently executed conditional branches.

Official announcement: Please refer to the link for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7026.html

CVE-2025-30219: About RabbitMQ (26th Mar 2025)

Preface: Message queues are quite important in machine learning for several reasons:

  1. Decoupling Components: They allow different parts of a machine learning system to operate independently. For example, data producers (like sensors or user inputs) can send data to a queue, and data consumers (like preprocessing units or models) can process this data at their own pace.
  2. Asynchronous Processing: Machine learning tasks, especially training and inference, can be resource-intensive and time-consuming. Message queues enable these tasks to be processed asynchronously, ensuring that the system remains responsive and efficient.
  3. Scalability: By using message queues, you can easily scale your machine learning system. Multiple consumers can process messages from the queue simultaneously, allowing the system to handle larger workloads without bottlenecks.

Background: RabbitMQ can be quite helpful in AI applications! RabbitMQ is an open-source message broker that facilitates communication between different parts of a system asynchronously. Here are some ways it can assist AI:

  1. Task Orchestration: RabbitMQ can manage and distribute tasks across various AI models and services, ensuring efficient processing and load balancing.
  2. Handling Asynchronous Requests: It can queue up inference requests and process them asynchronously, which is particularly useful for resource-intensive AI models.
  3. Data Queuing: RabbitMQ can queue data for processing, allowing AI systems to handle large volumes of data in a controlled manner.

If you enable the management Plugin, you will be able to use not only the webUI but also the HTTP API. In fact, the WebUI is a wrapper around the HTTP API.

Vulnerability details: RabbitMQ is a messaging and streaming broker. Versions prior to 4.0.3 are vulnerable to a sophisticated attack that could modify virtual host name on disk and then make it unrecoverable (with other on disk file modifications) can lead to arbitrary JavaScript code execution in the browsers of management UI users.

When a virtual host on a RabbitMQ node fails to start, recent versions will display an error message (a notification) in the management UI. The error message includes virtual host name, which was not escaped prior to open source RabbitMQ 4.0.3 and Tanzu RabbitMQ 4.0.3, 3.13.8.

Official announcement: Please refer to the link for details –

https://nvd.nist.gov/vuln/detail/CVE-2025-30219

CVE-2025-21424: Memory corruption while calling the NPU driver APIs concurrently (16th Mar 2025)

NVD Published Date: 03/03/2025
NVD Last Modified: 03/07/2025

Preface: Real-time processing of sensor data for tasks like obstacle detection and navigation is crucial, making NPUs ideal for these applications. NPUs help in real-time decision-making and control, which is essential for robotic applications. While NPUs are highly efficient for specific AI applications, they cannot replace GPUs due to their limited scope.

Background: Mutex Unlocking: The mutex is unlocked after the resource has been freed.

If another thread tries to access the resource after it has been freed but before the mutex is unlocked, it can lead to a use-after-free vulnerability. This is because the memory location might be reused for another purpose, leading to undefined behavior when the freed resource is accessed.

Vulnerability details: Memory corruption while calling the NPU driver APIs concurrently.

Reference:

mutex_unlock: This function releases a mutex that was previously locked. Mutexes are used to ensure that only one thread can access a particular section of code or data at a time, preventing race conditions.

&npu_dev->dev_lock: This is the address of the mutex lock associated with the npu_dev device. The dev_lock is a member of the npu_dev structure, and the & operator gets its address.

When this command is executed, it releases the lock on dev_lock, allowing other threads that might be waiting to acquire the lock to proceed

Official announcement: Please see the link for details –

https://nvd.nist.gov/vuln/detail/CVE-2025-21424

CVE-2025-23242 & CVE-2025-23243:NVIDIA Riva contains a vulnerability where a user could cause an improper access control issue (13th Mar 2025)

Preface: NeMo is an open source PyTorch-based toolkit for research in conversational AI that exposes more of the model and PyTorch internals. Riva supports the ability to import supported models trained in NeMo.

NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications, customized for your use case, and delivering real-time performance.

Background: NVIDIA Riva does not come with any default user accounts. Instead, it relies on secure access through NVIDIA NGC (NVIDIA GPU Cloud). Users need to log in to NGC to access and deploy Riva services. This ensures that only authorized users can set up and manage Riva deployments.

NVIDIA Riva’s default access control mechanisms are designed to ensure secure deployment and operation. By default, Riva employs:

Role-Based Access Control (RBAC): This allows administrators to define roles and assign permissions to users based on their roles.

There is authentication between NVIDIA NGC and Riva. When you pull Riva container images from NGC, you need to authenticate using your NGC API key. This involves:

  1. NGC CLI Configuration: You set up the NGC CLI with your API key, which acts as your authentication credential1.
  2. OAuth Token: The username for authentication is $oauthtoken, and the password is your NGC_API_KEY

Vulnerability details:

CVE-2025-23242 – NVIDIA Riva contains a vulnerability where a user could cause an improper access control issue. A successful exploit of this vulnerability might lead to escalation of privileges, data tampering, denial of service, or information disclosure.

CVE-2025-23243 – NVIDIA Riva contains a vulnerability where a user could cause an improper access control issue. A successful exploit of this vulnerability might lead to data tampering or denial of service.

Official announcement: Please see the official link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5625