Arm CPU Security Update:  Training in Transient Execution Attacks (17th Mar 2025)

Initial release: August 8, 2023

Last updated: 14 Mar 2025

Preface: AMD’s Zen3 and Zen4 architectures are not directly related to ARM design, as they are based on AMD’s own x86-64 architecture. ARM is concerned about Training in Transient Execution (TTE) attacks because these attacks exploit vulnerabilities in speculative execution, which can affect ARM processors just as they do other architectures like x86.

Background: There are 2 phenomena that enable an unprivileged attacker to leak arbitrary information on AMD Zen3 and Zen4 CPU products.

  • Phantom speculation – Trigger misprediction without any branch at the source of the misprediction.
  • Training in Transient Execution – Potential manipulate future mispredictions through a previous misprediction that attacker trigger.

Here are some key reasons why ARM is worried about TTE attacks:

Microarchitectural Manipulation: TTE attacks involve manipulating microarchitectural buffers, such as the branch target buffer (BTB) and return stack buffer (RSB), during speculative execution. This manipulation can lead to mispredictions and create transient windows where sensitive data can be accessed.

Cross-Architecture Concerns: While ARM processors have different microarchitectural designs compared to x86 processors, the fundamental principles of speculative execution and transient execution attacks apply across architectures. This means ARM needs to address these vulnerabilities to ensure the security of their processors.

Security Implications: Successful TTE attacks can bypass existing security mitigations and leak sensitive information, posing a significant threat to the security of ARM-based systems.

Official announcement: For details, please refer to the link – https://developer.arm.com/documentation/110363/1-0/?lang=en

CVE-2025-21424: Memory corruption while calling the NPU driver APIs concurrently (16th Mar 2025)

NVD Published Date: 03/03/2025
NVD Last Modified: 03/07/2025

Preface: Real-time processing of sensor data for tasks like obstacle detection and navigation is crucial, making NPUs ideal for these applications. NPUs help in real-time decision-making and control, which is essential for robotic applications. While NPUs are highly efficient for specific AI applications, they cannot replace GPUs due to their limited scope.

Background: Mutex Unlocking: The mutex is unlocked after the resource has been freed.

If another thread tries to access the resource after it has been freed but before the mutex is unlocked, it can lead to a use-after-free vulnerability. This is because the memory location might be reused for another purpose, leading to undefined behavior when the freed resource is accessed.

Vulnerability details: Memory corruption while calling the NPU driver APIs concurrently.

Reference:

mutex_unlock: This function releases a mutex that was previously locked. Mutexes are used to ensure that only one thread can access a particular section of code or data at a time, preventing race conditions.

&npu_dev->dev_lock: This is the address of the mutex lock associated with the npu_dev device. The dev_lock is a member of the npu_dev structure, and the & operator gets its address.

When this command is executed, it releases the lock on dev_lock, allowing other threads that might be waiting to acquire the lock to proceed

Official announcement: Please see the link for details –

https://nvd.nist.gov/vuln/detail/CVE-2025-21424

CVE-2025-23242 & CVE-2025-23243:NVIDIA Riva contains a vulnerability where a user could cause an improper access control issue (13th Mar 2025)

Preface: NeMo is an open source PyTorch-based toolkit for research in conversational AI that exposes more of the model and PyTorch internals. Riva supports the ability to import supported models trained in NeMo.

NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications, customized for your use case, and delivering real-time performance.

Background: NVIDIA Riva does not come with any default user accounts. Instead, it relies on secure access through NVIDIA NGC (NVIDIA GPU Cloud). Users need to log in to NGC to access and deploy Riva services. This ensures that only authorized users can set up and manage Riva deployments.

NVIDIA Riva’s default access control mechanisms are designed to ensure secure deployment and operation. By default, Riva employs:

Role-Based Access Control (RBAC): This allows administrators to define roles and assign permissions to users based on their roles.

There is authentication between NVIDIA NGC and Riva. When you pull Riva container images from NGC, you need to authenticate using your NGC API key. This involves:

  1. NGC CLI Configuration: You set up the NGC CLI with your API key, which acts as your authentication credential1.
  2. OAuth Token: The username for authentication is $oauthtoken, and the password is your NGC_API_KEY

Vulnerability details:

CVE-2025-23242 – NVIDIA Riva contains a vulnerability where a user could cause an improper access control issue. A successful exploit of this vulnerability might lead to escalation of privileges, data tampering, denial of service, or information disclosure.

CVE-2025-23243 – NVIDIA Riva contains a vulnerability where a user could cause an improper access control issue. A successful exploit of this vulnerability might lead to data tampering or denial of service.

Official announcement: Please see the official link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5625

CVE‑2025‑23360 – NVIDIA Nemo Framework contains a vulnerability (12th Mar 2025)

Preface: The symbol ~/. by itself is not a relative path traversal; it simply refers to the home directory of the current user. However, when combined with ./.., it can be part of a relative path traversal.

Relative path traversal involves using sequences like ../ to navigate up the directory hierarchy. For example, ~/. refers to the home directory, and ./.. moves up one directory level from the current directory. So, ~/. ./.. would navigate to the parent directory of the home directory, which can be considered a form of relative path traversal

Background: NVIDIA NeMo is an end-to-end platform designed for developing and deploying generative AI models. This includes large language models (LLMs), vision language models (VLMs), video models, and speech AI. NeMo offers tools for data curation, fine-tuning, retrieval-augmented generation (RAG), and inference, making it a comprehensive solution for creating enterprise-ready AI models. Here are some key capabilities of NeMo LLMs:

  1. Customization: NeMo allows you to fine-tune pre-trained models to suit specific enterprise needs. This includes adding domain-specific knowledge and skills, and continuously improving the model with reinforcement learning from human feedback (RLHF).
  2. Scalability: NeMo supports large-scale training and deployment across various environments, including cloud, data centers, and edge devices. This ensures high performance and flexibility for different use cases.
  3. Foundation Models: NeMo offers a range of pre-trained foundation models, such as GPT-8, GPT-43, and GPT-530, which can be used for tasks like text classification, summarization, creative writing, and chatbots.
  4. Data Curation: The platform includes tools for processing and curating large datasets, which helps improve the accuracy and relevance of the models.
  5. Integration: NeMo can be integrated with other NVIDIA AI tools and services, providing a comprehensive ecosystem for AI development.

Vulnerability details: NVIDIA Nemo Framework contains a vulnerability where a user could cause a relative path traversal issue by arbitrary file write. A successful exploit of this vulnerability may lead to code execution and data tampering.

Official announcement: Please see the official link for details

https://nvidia.custhelp.com/app/answers/detail/a_id/5623

CVE-2024-36347: Improper signature verification in AMD CPU ROM microcode patch loader (11th Mar 2025)

Originally published on March 5, 2025

Preface: The microcode patch loader in the CPU’s ROM (Read-Only Memory) is responsible for loading these updates into the CPU during the boot process. This ensures that the CPU runs the latest microcode, which can include important security and functionality improvements

Background: The System Management Mode (SMM) execution environment is a special-purpose operating mode provided by x86 CPUs for handling system-wide functions like power management and hardware control. When the CPU receives a System Management Interrupt (SMI), it switches from normal execution mode to SMM. In this mode, the CPU executes code stored in a special portion of memory called System Management RAM (SMRAM). This environment is isolated from the operating system and applications, allowing it to manage critical system functions transparently. Some uses of SMM include handling system events, managing system safety functions, and controlling power management operations.

Vulnerability details: Improper signature verification in AMD CPU ROM microcode patch loader may allow an attacker with local administrator privilege to load malicious microcode, potentially resulting in loss of integrity of x86 instruction execution, loss of confidentiality and integrity of data in x86 CPU privileged context and compromise of SMM execution environment.

Official announcement: Please see the official link for detailshttps://www.amd.com/en/resources/product-security/bulletin/amd-sb-7033.html

CVE-2025-22412: Fix more memory-unsafe logging (10th Mar 2025)

Preface: In smartphones, the System on Chip (SoC), such as those made by Qualcomm, integrates various components including the CPU, GPU, and memory. The embedded OS and applications run on this SoC, utilizing its built-in memory (RAM) for processing tasks.
The flash storage (often referred to as flashdisk) in smartphones is primarily used for storing persistent data like images, documents, apps, and the operating system itself. This storage is separate from the RAM used by the CPU and GPU for active processing
 
Background: Logging in Android does consume memory and can affect the OS memory resources. When you create logs, they are stored in memory, which can lead to increased memory usage. This can impact the performance of your application and the overall system, especially if there are a lot of log entries being generated.
 
Vulnerability details: In various locations around the stack, log statements use structures that may, in exceptional cases, have been freed by preceding calls.  This can lead to use after free and potentially to security vulnerabilities.
 
Ref: p_buf is a pointer to a buffer structure. If a buffer overflow in p_buf can potentially lead to a use-after-free vulnerability.
 
Official announcement: Please refer to the link for details – https://android.googlesource.com/platform/packages/modules/Bluetooth/+/806774b1cf641e0c0e7df8024e327febf23d7d7c

CVE-2024-0141: NVIDIA Hopper HGX for 8-GPU contains a vulnerability in GPU vBIOS  (10th Mar2025)

Last official update on February 28, 2025 at 3:28 PM

Preface: Hopper PPCIe is limited to HGX 8-way systems, where the eight GPUs and four NVSwitches are passed through to one VM. Other topologies are not supported.

Background: The GPU vBIOS can communicate through IOCTL (Input/Output Control) calls. IOCTL is a system call for device-specific input/output operations and other operations which cannot be expressed by regular system calls. In the context of GPU drivers, IOCTLs are used to interact with the GPU hardware, including tasks like memory management, command submission, and mode setting.

CUDA Interprocess Communication (IPC) is not supported in PPCIe mode. Developer tools such as NVIDIA Nsight for profiling are not supported in PPCIe mode.

When an IOCTL contains privileged functionality and is exposed unnecessarily, attackers may be able to access this functionality by invoking the IOCTL.

Vulnerability details: NVIDIA Hopper HGX for 8-GPU contains a vulnerability in the GPU vBIOS that may allow a malicious actor with tenant level GPU access to write to an unsupported registry causing a bad state. A successful exploit of this vulnerability may lead to denial of service.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5561

CVE-2024-0114: NVIDIA Hopper HGX for 8-GPU contains a vulnerability in the HGX Management Controller HMC (7 th March 2025)

Preface: NVIDIA collaborates with Supermicro for their server solutions, including the use of Supermicro’s BMC (Baseboard Management Controller) in certain systems. Supermicro provides a range of server solutions optimized for NVIDIA’s platforms.

Background: The NVIDIA Hopper HGX for 8 GPUs has several standout features:

High Performance: It hosts eight H100 Tensor Core GPUs, which are designed for AI and high-performance computing (HPC) workloads.

Advanced Connectivity: Each H100 GPU connects to four third-generation NVSwitches, enabling a fully connected topology. This setup allows any H100 GPU to communicate with any other H100 GPU concurrently at a bidirectional speed of 900 GB/s.

Enhanced Bandwidth: The NVLink ports provide more than 14 times the bandwidth of the current PCIe Gen4 x16 bus.

Vulnerability details: VIDIA Hopper HGX for 8-GPU contains a vulnerability in the HGX Management Controller (HMC) that may allow a malicious actor with administrative access on the BMC to access the HMC as an administrator. A successful exploit of this vulnerability may lead to code execution, denial of service, escalation of privileges, information disclosure, and data tampering.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5561

2024-53022: Memory corruption may occur during communication between primary and guest VM (6th Mar 2025)

Preface: QNX hypervisors are available in two variants: QNX Hypervisor and QNX Hypervisor for Safety.

The QNX Hypervisor variant (QH), which includes QNX Hypervisor 8.0, is not a safety-certified product. It must not be used in a safety-related production system.

If you are building a safety-related system, you must use the QNX Hypervisor for Safety (QHS) variant that has been built and approved for use in the type of system you are building, and you must use it only as specified in its Safety Manual. The latest QHS release is QNX Hypervisor for Safety 2.2, which is based on QNX SDP 7.1.

Background:  Functions like mprotect() are not commonly used in QNX hypervisor memory resource management for reasons:

  1. Memory Isolation: The hypervisor ensures that each VM (both primary and guest) has its own isolated memory space. This prevents one VM from accessing the memory of another, enhancing security and stability.
  2. Dynamic Memory Allocation: The hypervisor can dynamically allocate memory to VMs based on their needs. This means that if a guest VM requires more memory, the hypervisor can allocate additional memory from the available pool.
  3. Memory Ballooning: This technique allows the hypervisor to reclaim unused memory from VMs and reallocate it where needed. The balloon driver within the VM inflates to consume memory, which is then returned to the hypervisor.
  4. Memory Hotplug: The hypervisor can add or remove memory from a VM while it is running. This allows for flexible memory management without needing to restart the VM.

Vulnerability details: Memory corruption may occur during communication between primary and guest VM.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2024-53022

CVE-2025-22413: (ANDROID (KVM (arm64)) Don’t run a protected VCPU if it isn’t runnable! (5 March 2025)

Preface: The protected Kernel-based Virtual Machine (pKVM) is an advanced virtualization technology built on top of the Linux Kernel-based Virtual Machine (KVM). It is designed to enhance security and isolation for virtual machines (VMs) running on Android devices.

Key points about pKVM:

Enhanced Security: pKVM restricts access to the payloads running in guest VMs marked as ‘protected’ at the time of creation. This ensures that even if the host Android system is compromised, the guest VMs remain secure.

Isolation: It provides strong confidentiality and integrity guarantees by isolating memory and devices into individual protected VMs (pVMs).

Compatibility: pKVM is compatible with existing operating systems and workloads that rely on KVM-based virtual machines.

Background: In the context of pKVM, a vCPU (virtual Central Processing Unit) represents a virtualized CPU core assigned to a virtual machine (VM). Each vCPU in a VM’s operating system corresponds to one physical CPU core.

In pKVM, vCPUs are used to manage and allocate processing power to protected virtual machines (pVMs), ensuring that each VM has the necessary resources to operate securely and efficiently.

Vulnerability details: Don’t run a protected VCPU in pKVM if it isn’t in a runnable PSCI state. For protected VMs, the PSCI state is the reference state for whether they are runnable or not.

Official announcement: Please refer to the link for details – https://android.googlesource.com/kernel/common/+/1a3366f0d3d9b94a8c025d9863edc3b427435c4c