CVE-2025-38553: Linux kernel’s net/sched subsystem (Fixed) – 21st Aug 2025

Preface: While Kubernetes doesn’t directly expose net/sched as a configurable API, its network management and QoS features often rely on or interact with net/sched at the underlying Linux kernel level to achieve desired network behavior for containerized applications.

Background: net/sched is the Linux kernel subsystem responsible for traffic control (tc). It manages how packets are queued and scheduled for transmission on network interfaces using qdiscs (queueing disciplines). The default qdisc is typically pfifo_fast or fq_codel depending on the kernel version and distribution.

Vulnerability details: The vulnerability CVE-2025-38553 affects the Linux kernel’s net/sched subsystem, specifically the netem qdisc. It arises when multiple netem instances are added to the same qdisc tree, which can lead to:

  • Soft lockups
  • Out-of-memory (OOM) errors
  • Infinite loops during packet dequeueing

The root cause is flawed duplication logic in netem_enqueue, especially when a netem is nested within another netem in a qdisc hierarchy. The fix restricts the addition of a duplicating netem if another netem already exists in the tree

Official announcement: Please see the link for details –

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=09317dfb681ac5a96fc69bea0c54441cf91b8270

AMD responds to known potential side channels attack in SEV-SNPs. (20-08-2025)

Official Revision Date: 2025-08-12

Preface: AMD SEV-SNP is a confidential computing hardware technology present in AMD EPYC processors from generation 3 and newer. It is based on hardware virtualization extensions and achieves isolation by adding these measures: Full memory encryption.

SEV-SNP is not solely located in the firmware. While the firmware plays a crucial role in SEV-SNP (Secure Encrypted Virtualization-Secure Nested Paging), it’s a combination of hardware, firmware, and software working together to provide the security features. The firmware initializes the SEV-SNP context and performs attestation, but the core functionality also relies on the AMD processor’s hardware and the guest operating system’s software components. For example: SEV-SNP is supported on AMD EYPC processors starting with the AMD EPYC 7003 series processors. AMD SEV-SNP offers powerful and flexible support for the isolation of a guest virtual machine from an untrusted host operating system. It is very useful in public cloud and any untrusted host scenario.

Background: SEV-SNP is designed to prevent software-based integrity attacks and reduce risk associated with compromised memory integrity. The basic principle of SEV-SNP integrity is that if a VM is able to read a private (encrypted) page of memory, it must always read the value it last wrote.

AMD SEV-SNP allows the hypervisor to move encrypted guest pages, including swapping pages to disk, but this capability can also be exploited through ciphertext side-channel attacks, where the hypervisor monitors ciphertext changes to infer guest data.

The findings of the two research teams:

AMD has received reports from two research groups detailing methods by which a malicious hypervisor could potentially execute a side channel attack against a running secure encrypted virtualization – secure nested paging (SEV-SNP) guest.

The first report, titled “Relocate + Vote: Exploiting Ciphertext Side-Channels using Sparsity Information,” was submitted by researchers at the Toronto System Security Lab of the University of Toronto. 

A subsequent report from researchers at ETH Zurich titled “Chosen Plaintext Oracle against SEV-SNP,” outlines a similar exploitation technique that also leverages the ability to move or swap guest pages. 

Official announcement: For more information, please refer to the link:

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-3021.html

Overview of Transformer-based language models (19-08-2025)

Technical Highlights: Megatron-LM codebase efficiently trains models from 2 billion to 462 billion parameters across thousands of GPUs, achieving up to 47% Model FLOP Utilization (MFU) on H100 clusters. The Megatron-LM codebase has successfully benchmarked the training of a 462B parameter model using 6144 H100 GPUs, achieving up to 47% Model FLOP Utilization (MFU).

GPT-4, the latest iteration in OpenAI’s Generative Pre-trained Transformer series, significantly scales up the parameter count compared to its predecessors. While GPT-2 had 1.5 billion parameters and GPT-3 boasted 175 billion, GPT-4 is estimated to have a staggering 1.76 trillion parameters.

Parameters in Artificial Intelligence:

Parameters in AI are the variables that the model learns during training. They are the internal variables that the model uses to make predictions or decisions. In a neural network, the parameters include the weights and biases of the neurons. Parameters are used in AI to determine the output of the model for a given input. During training, the model adjusts its parameters to minimize the difference between its predictions and the actual values. This is typically done using an optimization algorithm, such as gradient descent. Gradient descent is a fundamental optimization algorithm in artificial intelligence, particularly in machine learning and deep learning. It’s used to minimize a function, often the cost function of a model, by iteratively adjusting the model’s parameters in the direction of the steepest descent.

About Megatron-LM: The Megatron-LM codebase, developed by NVIDIA, is widely used for training large parameter models, particularly Large Language Models (LLMs), due to its specialized features and optimizations designed for large-scale distributed training. Megatron-LM is a GPU-optimized framework developed by NVIDIA for training transformer models at scale. It supports models ranging from a few billion to hundreds of billions of parameters.

Outlines the core techniques:

  • Intra-layer model parallelism
  • Pipeline parallelism
  • Tensor parallelism
  • Efficient communication primitives using NCCL
  • Mixed precision training (FP16/BF16)

Ref: NVIDIA Collective Communications Library (NCCL), is a library developed by NVIDIA that provides optimized routines for multi-GPU and multi-node communication. It’s designed to accelerate collective communication patterns, like all-reduce, broadcast, reduce, and all-gather, which are crucial for deep learning frameworks and other parallel computing applications using NVIDIA GPUs.

Ref: FP16 and BF16 are both 16-bit floating-point formats used in AI training to improve performance and efficiency, but they differ in their dynamic range and precision. FP16, also known as half precision, offers higher precision for smaller values but has a limited range. BF16, or Brain Floating Point, has a wider dynamic range, making it more suitable for large-scale models where numerical stability is crucial, even at the cost of some precision.

Official details: For details, please refer to the link – https://github.com/NVIDIA/Megatron-LM

CVE-2025-23305 and CVE-2025-23306: About NVIDIA Megatron-LM (18-08-2025)

Official Updated 08/11/2025 06:16 AM

Preface: GPT-4 offers several key benefits, including improved accuracy, longer context handling, and the ability to process both text and image inputs. It also exhibits stronger guardrails, leading to more reliable and ethical outputs. Additionally, GPT-4 excels in various tasks like professional and academic benchmarks, creative writing, and adapting to user needs.

Background: The Megatron-LM codebase is a framework for training large, powerful transformer language models at scale, developed by NVIDIA. It focuses on efficient, model-parallel (tensor and pipeline) and multi-node pre-training of transformer-based models like GPT, BERT, and T5 using mixed precision

Megatron-LM codebase efficiently trains models from 2B to 462B parameters across thousands of GPUs, achieving up to 47% Model FLOP Utilization (MFU) on H100 clusters.

GPT-4, the latest iteration in OpenAI’s Generative Pre-trained Transformer series, significantly scales up the parameter count compared to its predecessors. While GPT-2 had 1.5 billion parameters and GPT-3 boasted 175 billion, GPT-4 is estimated to have a staggering 1.76 trillion parameters.

The Megatron-LM codebase has successfully benchmarked the training of a 462B parameter model using 6144 H100 GPUs, achieving up to 47% Model FLOP Utilization (MFU).

While this demonstrates the capability of the Megatron-LM framework to train very large models on H100 clusters, the exact number of H100 GPUs used to train GPT-4 is not publicly disclosed. GPT-4 was developed by OpenAI, and they have not released the specific hardware configurations used for its training.

Vulnerability details:

CVE-2025-23305       NVIDIA Megatron-LM for all platforms contains a vulnerability in the tools component, where an attacker may exploit a code injection issue. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.

CVE-2025-23306       NVIDIA Megatron-LM for all platforms contains a vulnerability in the megatron/training/

arguments.py component where an attacker could cause a code injection issue by providing a malicious input. A successful exploit of this vulnerability may lead to code execution, escalation of privileges, information disclosure, and data tampering.

Official announcement: For more information, please refer to the link

https://nvidia.custhelp.com/app/answers/detail/a_id/5685

CVE-2025-23298: About NVIDIA Merlin Transformers4Rec (15th Aug 2025)

Official Updated 08/11/2025 06:15 AM

Preface: While the Bible doesn’t specifically mention artificial intelligence, it reminds us that human knowledge and capabilities will increase dramatically in the last days (Daniel 12:4). Building and training neural networks is a cornerstone of modern artificial intelligence, enabling breakthroughs in fields such as computer vision, natural language processing, and robotics.

Background: NVIDIA Merlin Transformers4Rec is a Python library designed for building sequential and session-based recommender systems, leveraging the power of Transformer architectures, particularly for use with PyTorch. It is part of the broader NVIDIA Merlin ecosystem, which provides end-to-end GPU-accelerated solutions for recommender systems.

Transformers4Rec is pre-installed in the merlin-pytorch container that is available from the NVIDIA GPU Cloud (NGC) catalog.

NVIDIA Merlin PyTorch container, available on NVIDIA NGC (NVIDIA GPU Cloud), includes the necessary components for GPU acceleration, including the CUDA Toolkit.

The Merlin PyTorch container allows users to do preprocessing and feature engineering with NVTabular, and then train a deep-learning based recommender system model with PyTorch, and serve the trained model on Triton Inference Server.

Ref: NVTabular and RAPIDS (cuDF/cuML) for preprocessing and feature engineering.

Vulnerability details: NVIDIA Merlin Transformers4Rec for all platforms contains a vulnerability in a python dependency, where an attacker could cause a code injection issue. A successful exploit of this vulnerability might lead to code execution, escalation of privileges, information disclosure, and data tampering.

Official announcement: Please see the link for details

https://nvidia.custhelp.com/app/answers/detail/a_id/5683

CVE-2025-23294: NVIDIA WebDataset for all platforms contains a vulnerability 14-08-2025

Official Updated 08/11/2025 06:15 AM

Preface: WebDataset is a PyTorch IterableDataset implementation designed for efficient access to large datasets stored in POSIX tar archives. It focuses on sequential/streaming data access, which offers substantial performance advantages in environments where local storage is limited or I/O bottlenecks are a concern. WebDataset is particularly well-suited for very large-scale training, as it minimizes the need for local storage and allows for efficient data loading from various sources, including cloud storage.

Background: NVIDIA WebDataset refers to the integration of WebDataset with NVIDIA technologies like DALI or NeMo, rather than a separate NVIDIA-specific installation. Installing WebDataset itself is straightforward, as it is a Python library.

  • DALI is a portable, open-source software library for decoding and augmenting images, videos, and speech to accelerate deep learning applications.

DALI itself doesn’t extract .tar files directly — instead, it processes data streamed from tarballs via WebDataset or other loaders.

  • NVIDIA NeMo is a framework for building and deploying generative AI models, particularly those used in conversational AI like speech recognition and natural language processing.

It may extract or stream data depending on the configuration, but tarball handling is abstracted behind the data pipeline.

Vulnerability details: CVE-2025-23294 – NVIDIA WebDataset for all platforms contains a vulnerability where an attacker could execute arbitrary code with elevated permissions. A successful exploit of this vulnerability might lead to escalation of privileges, data tampering, information disclosure, and denial of service.

Official announcement: Please see the link for details

https://nvidia.custhelp.com/app/answers/detail/a_id/5658

A safe mode bypass vulnerability in Keras versions 3.0.0 through 3.10.0 (13th Aug 2025)

Preface: Deep learning in AI generally learns much faster than humans in specific, narrow tasks, especially those involving large datasets and complex computations. However, humans still excel at general intelligence, creative problem-solving, and learning with limited data.

Perhaps, AI does not have this advantage yet!

Background: Keras 3.0 is a major rewrite of the Keras deep learning API, designed to provide a unified and flexible platform for building and deploying deep learning models. Its most significant feature is its multi-backend architecture, allowing users to run Keras workflows on top of various popular deep learning frameworks.

TensorFlow is a comprehensive, low-level machine learning framework capable of building and training models directly. However, Keras plays a crucial role as its official high-level API, providing several benefits that make deep learning development significantly easier and more efficient within the TensorFlow ecosystem.

Keras 3.0 does it work in lambda layer? Yes, the Lambda layer continues to be available and functional in Keras 3.0. In machine learning, specifically within the context of deep learning frameworks like Keras or TensorFlow, a Lambda layer is a type of layer that allows you to wrap arbitrary expressions or functions as a layer in your neural network model.

Vulnerability details: A safe mode bypass vulnerability in the `Model.load_model` method in Keras versions 3.0.0 through 3.10.0 allows an attacker to achieve arbitrary code execution by convincing a user to load a specially crafted `.keras` model archive.

Official announcement: Please see the link for details

https://www.tenable.com/cve/CVE-2025-8747

CVE-2025-6573: About Imagination’s PowerVR DDK (12th AUG 2025)

Preface: PowerVR is a brand of graphics processing unit (GPU) IP ( intellectual property) developed by Imagination Technologies. In the context of Android, PowerVR GPUs are integrated into mobile System-on-Chips (SoCs) by various manufacturers, providing the graphics processing capabilities for Android devices. It’s a key competitor to Adreno (Qualcomm) and Mali (Arm) GPUs in the Android market.

Background: The Android SDK and Imagination’s PowerVR DDK are both software development kits, but they serve different purposes. The Android SDK is a comprehensive set of tools for developing Android applications, while the PowerVR DDK is a specialized kit for optimizing and integrating graphics rendering with Imagination Technologies’ PowerVR GPUs.

A DDK is a set of tools and libraries provided by an operating system vendor to facilitate the development of device drivers and kernel modules. Kernel modules are pieces of code that can be loaded into the operating system kernel at runtime, extending its functionality without requiring a full system reboot. This is common in Linux and Android kernel development.

The PowerVR DDK (Driver Development Kit) Native Lib C Framework refers to the foundational libraries and tools provided by Imagination Technologies to facilitate the development of graphics applications and drivers for systems utilizing PowerVR GPUs.

Vulnerability details: Kernel software installed and running inside an untrusted/rich execution environment (REE) could leak information from the trusted execution environment (TEE).

  • The scratch buffer (pui8FWScratchBuf) is used by the GPU firmware for temporary data.
  • If this buffer is mapped or accessible from REE, malicious or compromised kernel software could read or overwrite data that should be protected within the TEE.

Official announcement: Please refer to the link for details

https://nvd.nist.gov/vuln/detail/CVE-2025-6573

AMD responds to ETH Zurich researchers’ technical findings (11th Aug 2025)

Preface: AMD K10 architecture, first launched in 2007, is not considered valid for modern computing needs. While it was a significant step in AMD’s processor development, it has been superseded by newer architectures like Zen, which offer significant performance and efficiency improvements.

Background: The “AMD Zen stack engine” generally refers to the AMD Zen microarchitecture and its various generations used in AMD processors. Zen utilizes a modular structure, with the basic building block being the CPU Complex (CCX). Each CCX contains multiple cores (e.g., four cores in early Zen generations) that share a large L3 cache.

Technical details: The stack engine is a feature that has a speculative stack address delta register in the front-end that is updated directly with push/pop instructions, and that delta is dispatched with the stack memory uop to be added to the original stack address register when doing address generation in the load/store units.

The stack engine is not predictive in nature and as such does not open up new transient execution windows. However, it might still leak information under speculation. The following two main scenarios were analyzed:

First, Researchers from ETH Zurich checked whether the stack engine offset is reset when the CPU corrects a branch misprediction. We find that the offset is reset to zero on Zen 3-4 while Zen 5 appears to retain an offset. We were not able to conclusively determine the effect on the other architectures due to excessive noise introduced by the misspeculation.

Second, Reseachers from ETH Zurich aimed to detect stack engine sync operations that occur only on the speculative path that are latersquashed. Using performance monitor counters (PMCs), we confirm that sync operations are indeed also observable under transient execution on Zen 3-5. An attacker might theoretically combine this behavior with a classical indirect branch target injection to build a call-depth disclosure gadget in a cross-thread attack. However, we note that such an attack would only slightly expand the capabilities of a cross-thread attacker.

Workaround: AMD continues to recommend software developers employ existing best practices including constant time algorithm and avoid secret-dependent data access or control flows to help mitigate the potential vulnerability.

Official announcement: Please refer to the link for detailshttps://www.amd.com/en/resources/product-security/bulletin/amd-sb-7045.html

CVE-2025-0932: Arm fixes userspace vulnerability in Mali GPU driver (8th Aug 2025)

Preface: The Valhall family of Mali GPUs uses the same top-level architecture as the previous generation Bifrost GPUs. The Valhall family uses a unified shader core architecture.

The Arm 5th generation GPU architecture, including the Immortalis and Mali GPUs, represents a modern design for mobile and other client devices.

Background: ioctl (Input/Output Control) is the primary syscall used by userspace GPU drivers to communicate with the kernel-space driver. It allows sending custom commands and structured data to the driver.

Typical ioctl operations in Mali drivers include:

  • MALI_IOCTL_ALLOC_MEM: Allocate GPU-accessible memory
  • MALI_IOCTL_FREE_MEM: Free previously allocated memory
  • MALI_IOCTL_SUBMIT_JOB: Submit a GPU job (e.g., shader execution)
  • MALI_IOCTL_WAIT_JOB: Wait for job completion
  • MALI_IOCTL_MAP_MEM: Map memory to userspace

The path bifrost-drivers/driver/product/kernel/drivers/gpu/arm indicates that the code within this directory is part of the kernel-space drivers for Arm Mali Bifrost GPUs.

Vulnerability details: Use After Free vulnerability in Arm Ltd Bifrost GPU Userspace Driver, Arm Ltd Valhall GPU Userspace Driver, Arm Ltd Arm 5th Gen GPU Architecture Userspace Driver allows a non-privileged user process to perform valid GPU processing operations, including via WebGL or WebGPU, to gain access to already freed memory.

Scope of impact: This issue affects Bifrost GPU Userspace Driver: from r48p0 through r49p3, from r50p0 through r51p0; Valhall GPU Userspace Driver: from r48p0 through r49p3, from r50p0 through r54p0; Arm 5th Gen GPU Architecture Userspace Driver: from r48p0 through r49p3, from r50p0 through r54p0.

Official announcement: Please see the link for details –

https://nvd.nist.gov/vuln/detail/CVE-2025-0932

https://developer.arm.com/documentation/110626/latest

Ref: Typo, attached code is free after use, is part of the remedy. The use after free not shown.

antihackingonline.com