Category Archives: AI and ML

CVE-2024-0140 : NVIDIA RAPIDS contains a vulnerability in cuDF and cuML, where a user could cause a deserialization of untrusted data issue (24th Jan 2025)

Preface: RAPIDS™, part of NVIDIA CUDA-X, is an open-source suite of GPU-accelerated data science and AI libraries with APIs that match the most popular open-source data tools. It accelerates performance by orders of magnitude, at scale, across data pipelines.

Background: RAPIDS is an open-source suite of software libraries and frameworks developed by NVIDIA to accelerate and streamline data science and analytics workflows. One of its key components is cuDF, a GPU-accelerated DataFrame library that mirrors the functionality of Pandas but operates at much higher speeds. This allows for rapid data loading, filtering, and transformation with reduced memory usage.

cuDF: Python bindings for libcudf (Pandas like API for DataFrame manipulation)

cuML: C++/CUDA ML Algorithms: C++/CUDA machine learning algorithms

Vulnerability details: NVIDIA RAPIDS contains a vulnerability in cuDF and cuML, where a user could cause a deserialization of untrusted data issue. A successful exploit of this vulnerability might lead to code execution, data tampering, denial of service, and information.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5597

It is speculated that CVE-2025-0612 and CVE-2025-0611 are related to the rendering engine! (23-01-2025)

Preface: Humans have integrated smartphones (IoT) into their daily lives due to habit formation. Suddenly one day, the browsers of more than 20% of the people on the planet stopped working for half a day. Maybe you’ll see long queues outside the hospital!

It similar as an intangible control to you. Go to AI age/century, smartphone is the great partner of AI.

Background: Edge was initially built with Microsoft’s own proprietary browser engine, EdgeHTML, and their Chakra JavaScript engine. In late 2018, it was announced that Edge would be completely rebuilt as a Chromium-based browser with Blink and V8 engines.

Chrome used only WebCore, and included its own JavaScript engine named V8 and a multiprocess system. Chrome for iOS continues to use WebKit because Apple requires that web browsers on that platform must do so.

Remark: Edge was originally based on Chakra but has more recently been rebuilt using Chromium and the V8 engine. V8 is written in C++, and it’s continuously improved.

Vulnerability details:

CVE-2025-0612 Out of bounds memory access in V8 in Google Chrome prior to 132.0.6834.110 allowed a remote attacker to potentially exploit heap corruption via a crafted HTML page. (Chromium security severity: High)

CVE-2025-0611 Object corruption in V8 in Google Chrome prior to 132.0.6834.110 allowed a remote attacker to potentially exploit heap corruption via a crafted HTML page. (Chromium security severity: High)

Official announcement: Please refer to the link for details

https://nvd.nist.gov/vuln/detail/CVE-2025-0611

https://nvd.nist.gov/vuln/detail/CVE-2025-0612

CVE‑2024‑0146: A design weakness in the Virtual GPU Manager, where a malicious guest could cause memory corruption. (20-1-2025)

CVE20240146: A design weakness in the Virtual GPU Manager, where a malicious guest could cause memory corruption. (20-1-2025)

Preface: In Kernel mode, the executing code has complete and unrestricted access to the underlying hardware. It can execute any CPU instruction and reference any memory address. Kernel mode is generally reserved for the lowest-level, most trusted functions of the operating system.

If the destination buffer is not large enough, the function will write null characters to the destination buffer to ensure that the string is null-terminated, but this can lead to a buffer overflow if the null characters overwrite adjacent memory locations.

Background: NVIDIA vGPU software enables multiple virtual machines (VMs) to have simultaneous, direct access to a single physical GPU, using the same NVIDIA graphics drivers that are deployed on non-virtualized operating systems.
NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous, direct access to a single physical GPU, using the same NVIDIA graphics drivers that are deployed on non-virtualized operating systems. By doing this, NVIDIA vGPU provides VMs with unparalleled graphics performance, compute performance, and application compatibility, together with the cost-effectiveness and scalability brought about by sharing a GPU among multiple workloads.

Vulnerability details: NVIDIA vGPU software contains a vulnerability in the Virtual GPU Manager, where a malicious guest could cause memory corruption. A successful exploit of this vulnerability might lead to code execution, denial of service, information disclosure, or data tampering.

Impact software products:

Citrix Hypervisor, VMware vSphere, Red Hat Enterprise Linux KVM, Ubuntu

Azure Local

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5614/~/security-bulletin%3A-nvidia-gpu-display-driver—january-2025

CVE-2024-53869: NVIDIA Unified Memory driver for Linux design weakness. A successful exploit of this vulnerability might lead to information disclosure. (16th Jan 2025)

Preface: RAM and Unified Memory are essentially the same thing. Unified Memory is just RAM Built-in CPU chips. It’s unified with the CPU. So 128GB of RAM is adequate to 128GB of unified Memory.

Background: Nvidia designs graphics processing units (GPUs) for the gaming and professional markets, as well as system on a chip units (SoCs) for the mobile computing and automotive market. This page tracks Nvidia drivers, which provide support for their various GPU lineups and are available for Windows, Linux, Solaris, and FreeBSD.

Information leaks are not rare! In Linux kernel, Information leak vulnerabilities are the most prevalent type.Kernel Memory Sanitizer (KMSAN) discovered more than a hundred uninitialized data use bugs.

Vulnerability details:  NVIDIA Unified Memory driver for Linux contains a vulnerability where an attacker could leak uninitialized memory. A successful exploit of this vulnerability might lead to information disclosure.

Official announcement: Please refer to the link for details –

https://nvidia.custhelp.com/app/answers/detail/a_id/5614

About CVE-2024-0135, CVE-2024-0136 & CVE-2024-0137 – NVIDIA Container Toolkit and NVIDIA GPU Operator contains an improper isolation vulnerability (13th Jan 2025)

Preface: In software development, time-of-check to time-of-use (TOCTOU, TOCTTOU or TOC/TOU) is a class of software bugs caused by a race condition involving the checking of the state of a part of a system (such as a security credential) and the use of the results of that check.

Background: The NVIDIA container stack is architected so that it can be targeted to support any container runtime in the ecosystem. The components of the stack include:

-The NVIDIA Container Runtime (nvidia-container-runtime)

-The NVIDIA Container Runtime Hook (nvidia-container-toolkit / nvidia-container-runtime-hook)

-The NVIDIA Container Library and CLI (libnvidia-container1, nvidia-container-cli)

The components of the NVIDIA container stack are packaged as the NVIDIA Container Toolkit.

The NVIDIA Container Toolkit is a key component in enabling Docker containers to leverage the raw power of NVIDIA GPUs. This toolkit allows for the integration of GPU resources into your Docker containers.

Remark: The Podman command can be used with remote services using the –remote flag. Connections can be made using local unix domain sockets, ssh

Vulnerability details:

CVE-2024-0135 – NVIDIA Container Toolkit contains an improper isolation vulnerability where a specially crafted container image could lead to modification of a host binary. A successful exploit of this vulnerability may lead to code execution, denial of service, escalation of privileges, information disclosure, and data tampering.

CVE-2024-0136 – NVIDIA Container Toolkit contains an improper isolation vulnerability where a specially crafted container image could lead to untrusted code obtaining read and write access to host devices. This vulnerability is present only when the NVIDIA Container Toolkit is configured in a nondefault way. A successful exploit of this vulnerability may lead to code execution, denial of service, escalation of privileges, information disclosure, and data tampering.

CVE-2024-0137 – NVIDIA Container Toolkit contains an improper isolation vulnerability where a specially crafted container image could lead to untrusted code running in the host’s network namespace. This vulnerability is present only when the NVIDIA Container Toolkit is configured in a nondefault way. A successful exploit of this vulnerability may lead to denial of service and escalation of privileges.

Official announcement: Please refer to the vendor announcement for detail – https://nvidia.custhelp.com/app/answers/detail/a_id/5599

Machine learning: From basics to GPU-related INT8( 3rd Jan 2025)

Preface: If a living thing wants to survive, his life involve competition. For example, hunting and defense. During this process, he started learning. that’s the nature of it.

Remember this is the basic principle. When non-human beings on Earth can enter into the learning process. He will be humanity’s rival. In fact, who will rule the earth depends entirely on the wisdom of the opponent?

Integer Arithmetic for machine learning: INT8 uses 8 bits, which allows for 256 possible values, while INT4 uses 4 bits, which allows for 16 possible values. In comparison, floating-point precision, such as FP32, uses 32 bits to represent a wide range of values.

The advantage of int over float is computational speed. Integers are represented in memory as a fixed value. Floats, on the other hand are stored as a mathematical construct, mantissa and exponent so there is computation involved just in assessing the value.

Integers are the simplest numerical data types (Numeric data types). Because of this, their storage space is much less, and their processing is much faster than floating point types.

An integer (known also as int) is a whole number without a decimal part. It can be positive, negative, or zero. Examples of integers are -3, 0, 5, 100, and so on. The integer data type is used to represent values such as counting, indexing, or storing quantities that can only be whole numbers.

Float (floating-point number) is a number that includes a decimal part. Examples of floating-point numbers are -3.14, 2.71828, 0.5, 1.0, and so on. The float data type is used to represent values that can have a decimal part or require high precision, such as measurements, calculations involving decimal values, or scientific computations.

Summary: Integer represents whole numbers without a decimal part, while float represents floating-point numbers with a decimal part. Integer has exact precision and a larger range, whereas float has limited precision and can represent numbers with a decimal part.

Technical article: Is Integer Arithmetic Enough for Deep Learning Training? Please refer to link –  https://proceedings.neurips.cc/paper_files/paper/2022/file/af835bd1b5b689c3f9d075ae5a15bf3e-Paper-Conference.pdf

People focus on Apple M4 proprietary design. But Apple seems to prefer SME in ARM not his AMX (2nd Jan 2025)

Preface: Matrices help break down large, complex datasets into digestible chunks. Matrix multiplication allows machine learning models to identify complex patterns. By updating these matrices during training, the AI system continually improves and becomes more accurate.

Background: The New Armv9 architecture feature offers significant performance uplifts for AI and ML-based applications, including generative AI. SME (Scalable Matrix Extension) is an Instruction Set Architecture (ISA) extension introduced in the Armv9-A architecture, which accelerates AI and ML workloads and enables improved performance, power efficiency, and flexibility for AI and ML-based applications running on the Arm CPU.

Technology focus:  AMX was Apple’s proprietary design, it basically takes over CPU work for ML where something hasn’t been programmed for or isn’t able to be accelerated by the neural engine itself, that is bleeding edge experimental ML that hasn’t been “baked in” to the hardware. It makes the CPU less bad at sparse matrices.

Ref: The Sparse matrices are widely used in the various fields particularly in the machine learning and data science: Recommendation Systems: In collaborative filtering for the recommendation systems user-item interaction matrices are often sparse as users typically interact with the only a small subset of items.

SME is ARM’s version which is now industry standard which can be addressed by standard ARMv9 toolchains. The new feature on M4 shown that apple targeted this industry standard.

Official announcement: Apple introduces M4 chip – https://www.apple.com/hk/en/newsroom/2024/05/apple-introduces-m4-chip/

CVE-2024-56756: nvme-pci: fix freeing of the HMB descriptor table (30th Dec 2024)

Preface: Large Hadron Collider (LHC) at CERN works with amazing quantities of data and has publicly stated that they get much higher I/O and memory bandwidth — more than a terabit per second of data – with their AMD-based system. If they get that kind of performance, other end users will be in great shape. Plus, more PCIe lanes means more NVMe drives at native speed, versus storage interfaces running at switched speeds (which adds a latency and bottleneck points). Full utilization will make a huge difference in stored data access and processing.

Background: The impact of the fast PCIe technology available today is spread over several areas.

– The ability to use more x16 devices (such as graphics processing units (GPUs) and network cards) at full speed – which means data can be transferred at a faster rate

– The ability to use higher bandwidth network cards – which means more quantities of data can be transferred per second

– Non-volatile memory express (NVMe) storage was already incredibly fast and with PCIe Gen 4 it is even faster. In some cases, there is twice the performance in speed and throughput.

Vulnerability details: The HMB descriptor table is sized to the maximum number of descriptors that could be used for a given device, but __nvme_alloc_host_mem could break out of the loop earlier on memory allocation failure and end up using less descriptors than planned for, which leads to an incorrect size passed to dma_free_coherent.

In practice this was not showing up because the number of descriptors tends to be low and the dma coherent allocator always allocates and frees at least a page.

Ref: In the Linux kernel, the following vulnerability has been resolved: nvme-pci: fix freeing of the HMB descriptor table

Official announcement: Please refer to the link for details

https://nvd.nist.gov/vuln/detail/CVE-2024-56756

Pushing open source development concept into space (27th Dec 2024)

Preface: We live in a three-dimensional world. We move in space, left or right, forward or backward, up or down. Furthermore, living things do not live forever. Hardware and software also have life cycles. Human beings seem to be destined to live on earth. There are eight planets in the solar system that are not suitable for human survival. Rockets travel through the atmosphere to explore space. The time required is unknown, and there is no absolute answer to whether the target will be found. In space, the unit of distance is light years. From one planet to another. It requires at least a lifetime of human dedication. I assume that the AI ​​collects all existing data collected by SpaceX for analysis, and if the AI ​​cannot completely open the secret door of the Einstein-Rosen Bridge (for time travel), maybe he will stay on Earth.

Technical focus: For computers to survive in space, they must be hardened — made of resilient materials and designed to withstand high doses of radiation. But to make a computer fit for space takes years. Satellite manufacturers therefore often have to make do with rather obsolete processors.

About software development: Java has become one of the most widely used programming languages across various industries, including space exploration. At NASA, Java is used for developing highly interactive systems, mission-critical software, and user interfaces that support space operations.

Ref: Java Pathfinder (JPF) is a model checker for Java. The technology takes a Java program and “executes” it in a way that explores all possible executions/interleavings of the threads in the program. This allows JPF to detect certain bugs (e.g., deadlocks and assertion violations) that may be missed during testing.

About the topic: Antmicro & AetheroSpace launched  Zephyr IoT into space in SpaceX’s. Aethero has recently announced a groundbreaking collaboration with Antmicro, a leading technology company specializing in open source tools, to develop cutting-edge edge AI hardware tailored for space applications.

Antmicro played a crucial role in providing the software foundation for the NxN Edge Computing Module, contributing both Linux and Zephyr RTOS software for controlling the payload. Additionally, Antmicro implemented their open source RDFM framework, enabling modular, configurable, multi-OS device OTA updates and fleet management through Aethero’s user portal.

For details about Antmicro, please refer to link below: https://hardwarebee.com/electronic-breaking-news/aethero-and-antmicro-collaborate-on-open-source-space-edge-ai-design/

Are you still a fan of Nvidia? Or do you support AMD now? (23rd Dec 2024)

Preface: In the zone artificial intelligence (AI), NVIDIA and AMD are leading the way, pushing the limits of computing power. Both companies have launched powerful AI chips, but the comparison between the H100 and MI250X raises the question of superiority.

Background: What is AMD Instinct MI250X? AMD Instinct™ MI250X Series accelerators are uniquely suited to power even the most demanding AI and HPC workloads, delivering exceptional compute performance, massive memory density, high-bandwidth memory, and support for specialised data formats.

AMD now has more computing power than Nvidia in the Top500. Five systems use AMD processors (El Capitan, Frontier, HPC6, LUMI, and Tuolumne) while three systems use Intel (Aurora, Eagle, Leonardo).

Software Stack: ROCm offers a suite of optimizations for AI workloads from large language models (LLMs) to image and video detection and recognition, life sciences and drug discovery, autonomous driving, robotics, and more. ROCm supports the broader AI software ecosystem, including open frameworks, models, and tools.

HIP is a thin API with little or no performance impact over coding directly in NVIDIA CUDA or AMD ROCm.

HIP enables coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, and more.

Developers can specialize for the platform (CUDA or ROCm) to tune for performance or handle tricky cases.

Ref:  What is the difference between ROCm and hip?

ROCm™ is AMD’s open source software platform for GPU-accelerated high performance computing and machine learning. HIP is ROCm’s C++ dialect designed to ease conversion of CUDA applications to portable C++ code.

Official article: Please refer to the link for details

https://www.amd.com/en/products/accelerators/instinct/mi200/mi250x.html