Category Archives: AI and ML

A critical step in exploiting a buffer overflow is determining the offset where important program control information is overwritten. In the Linux kernel, the (CVE-2024-41011) vulnerability has been resolved. (18-07-2024)

Preface: The PAGE_SIZE macro defined in the Linux kernel source determines the page size. Its definition is in the kernel header file /usr/src/kernels/5.14[.] 0-22. el9[.] x86_64/include/asm-generic/page.

Background: MMIO stands for Memory-Mapped Input/Output. In Linux, MMIO is a mechanism used by devices to interface with the CPU that involves mapping their control registers and buffers directly into the processor’s memory address space.

This enables the CPU to access device registers and exchange data with devices using load and store instructions, just as if they were conventional memory locations. Graphics cards, network interfaces, and storage controllers all employ MMIO to effectively conduct input and output tasks.

Vulnerability details: drm/amdkfd: don’t allow mapping the MMIO HDP page with large pages We don’t get the right offset in that case. The GPU has an unused 4K area of the register BAR space into which you can remap registers. We remap the HDP flush registers into this space to allow userspace (CPU or GPU) to flush the HDP when it updates VRAM. However, on systems with >4K pages, we end up exposing PAGE_SIZE of MMIO space.

Official announcement: Please refer to the official announcement for details – https://nvd.nist.gov/vuln/detail/CVE-2024-41011

CVE-2024-41009: bpf – Fix overrunning reservations in ringbuf (17th July 2024)

Preface: Consumer and producer counters are put into separate pages to allow each position to be mapped with different permissions. This prevents a user-space application from modifying the position and ruining in-kernel tracking. The permissions of the pages depend on who is producing samples: user-space or the kernel. Starting from Linux 5.8, BPF provides a new BPF data structure (BPF map): BPF ring buffer (ringbuf). It is a multi-producer, single-consumer (MPSC) queue and can be safely shared across multiple CPUs simultaneously.

Background: The first core skill point is “BPF Hooks”, that is, where in the kernel can BPF programs be loaded. There are nearly 10 types of hooks in the current Linux kernel, as shown below:

kernel functions (kprobes)

userspace functions (uprobes)

system calls

fentry/fexit

Tracepoints

network devices (tc/xdp)

network routes

TCP congestion algorithms

sockets (data level)

Vulnerability details: For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in [0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets allocate a chunk B with size 0x3000. This will succeed because consumer_pos was edited ahead of time to pass the `new_prod_pos – cons_pos > rb->mask` check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data pages. This means that chunk B at [0x4000,0x4008] is chunk A’s header. bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header’s pg_off to then locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk B modified chunk A’s header, then bpf_ringbuf_commit() refers to the wrong page and could cause a crash.  

Official announcement: Please refer to the official announcement for details – https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=47416c852f2a04d348ea66ee451cbdcf8119f225

About CVE-2024-41008: When a design weakness is discovered in a GPU, it is now not limited to affecting graphics cards! Machine learning should be on alert! (16th July 2024)

Preface: In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others. In garbage collection algorithms, reference counts may be used to deallocate objects that are no longer needed.

Background: The drm/amdgpu driver supports all AMD Radeon GPUs based on the Graphics Core Next (GCN) architecture.

AI and Machine Learning Development on a Local Desktop with AMD Radeon™ Graphics Cards

AMD now supports RDNA™ 3 architecture-based GPUs for desktop based AI and ML workflows using AMD ROCm™ software. Developers can work with ROCm 6.1 software for Radeon on Linux® systems using PyTorch®, TensorFlow and ONNX Runtime. Added support for WSL 2 (Windows® Subsystem for Linux) now also enables users to develop with AMD ROCm™ software on a Windows® system, eliminating the need for dual boot set ups.

Vulnerability details: The CVE does not describe the vulnerability enumeration. Additionally, AMD only provides patch change details. Perhaps the design weakness in CVE-2024-41008 is related to garbage collection.

This patch changes the handling and lifecycle of vm->task_info object.

The major changes are:

  • vm->task_info is a dynamically allocated ptr now, and its uasge is reference counted.
  • introducing two new helper funcs for task_info lifecycle management
    • amdgpu_vm_get_task_info: reference counts up task_info before returning this info
    • amdgpu_vm_put_task_info: reference counts down task_info
  • – last put to task_info() frees task_info from the vm.

Official announcement: Please refer to the vendor announcement for details – https://nvd.nist.gov/vuln/detail/CVE-2024-41008

AMD released the CVE-2023-20587 security update on July 13, 2024.Don’t underestimate this related SPI flash design weakness! (15th Jul 2024)

Preface: SMM is the privileged mode of the processor. Like BIOS and UEFI, SMM code operates underneath the operating system. SMM has full access to physical memory, SMM-specific memory called SMRAM, MSR-specific scratchpad, the SPI flash region to read and write BIOS variables, and I/O operations. Additionally, SMM is designed to be invisible to lower privileged layers such as the operating system kernel or hypervisor.

Background: Attackers typically escalate privileges to the SMM by exploiting vulnerabilities in the SMM code. The OS calls SMM code through system management interrupts, or SMI, and passes parameters to SMI handlers using a shared memory area called the SMM Communication Buffer.

Vulnerability details: CVE-2023-20587: Improper Access Control in System Management Mode (SMM) may allow an attacker access to the SPI flash potentially leading to arbitrary code execution.

The relevant vulnerabilities are as follows:

CVE-2023-20579: Improper Access Control in the AMD SPI protection feature may allow a user with Ring0 (kernel mode) privileged access to bypass protections potentially resulting in loss of integrity and availability.

CVE-2023-20576: Insufficient Verification of Data Authenticity in AGESA™ may allow an attacker to update SPI ROM data potentially resulting in denial of service or privilege escalation.

CVE-2023-20577: A heap overflow in SMM module may allow an attacker with access to a second vulnerability that enables writing to SPI flash, potentially resulting in arbitrary code execution.

Official announcement: Please refer to the vendor announcement for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7009.html

CVE-2024-0102:  About NVIDIA® CUDA® Toolkit. If you remember, a similar incident happened in April of this year. Believe this is a weakness of similar designs. (11 July 2024)

Preface: OpenAI revealed that the project cost $100 million, took 100 days, and used 25,000 NVIDIA A100 GPUs. Each server equipped with these GPUs uses approximately 6.5 kW, so an estimated 50 GWh of energy is consumed during training.

Background: Parallel processing is a method in computing of running two or more processors (CPUs) to handle separate parts of an overall task. Breaking up different parts of a task among multiple processors will help reduce the amount of time to run a program. GPUs render images more quickly than a CPU because of its parallel processing architecture, which allows it to perform multiple calculations across streams of data simultaneously. The CPU is the brain of the operation, responsible for giving instructions to the rest of the system, including the GPU(s).

NVIDIA CUDA provides a simple C/C++ based interface. The CUDA compiler leverages parallelism built into the CUDA programming model as it compiles your program into code.
CUDA is a parallel computing platform and programming interface model created by Nvidia for the development of software which is used by parallel processors. It serves as an alternative to running simulations on traditional CPUs.

Vulnerability details: NVIDIA CUDA Toolkit for all platforms contains a vulnerability in nvdisasm, where an attacker can cause an out-of-bounds read issue by deceiving a user into reading a malformed ELF file. A successful exploit of this vulnerability might lead to denial of service.

Official announcement: Please refer to the vendor announcement for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5548

CVE-2024-39489: Linux kernel enhance memory management on IPv6 feature (11 July 2024)

Preface: The Linux kernel implements most of its IPv6 parts from USAGI. USAGI project was founded to improve and develop Linux IPv6 stack. The integrated USAGI version/release is unknown. Implemented into the kernel are the core functions of USAGI; the “standard” user-level programs provide basic IPv6 functionality.

Background: IPv6 converting to using crypto_pool has the following advantages.

– now SR uses asynchronous API which may potentially free CPU cycles and improve performance for of CPU crypto algorithm providers;

– hash descriptors now don’t have to be allocated on boot, but only at the moment SR starts using HMAC and until the last HMAC secret is deleted;

– potentially reuse ahash_request(s) for different users

– allocate only one per-CPU scratch buffer rather than a new one for

  each user

– have a common API for net/ users that need ahash on RX/TX fast path

Vulnerability details: In the Linux kernel, the following vulnerability has been resolved: ipv6: sr: fix memleak in seg6_hmac_init_algo seg6_hmac_init_algo returns without cleaning up the previous allocations if one fails, so it’s going to leak all that memory and the crypto tfms. Update seg6_hmac_exit to only free the memory when allocated, so we can reuse the code directly.

Official announcement: For detail, please refer to link – https://nvd.nist.gov/vuln/detail/CVE-2024-39489

Get closer look CVE-2024-39920: About “SnailLoad” issue (5-Jul-2024)

NVD Published Date: 07/03/2024

Preface: How is RTT measured in TCP? Measures the time from sending a packet to getting an acknowledgment packet from the target host.

Background: A new technology standard called “RFC 9293” was released on August 18, 2022.

Highlight:

-Acknowledgment Number:  32 bits – If the ACK control bit is set, this field contains the value of the next sequence number the sender of the segment is expecting to receive.  Once a connection is established, this is always sent.

-There are also methods of “fingerprinting” that can be used to infer the host TCP implementation (operating system) version or platform
information. These collect observations of several aspects, such as
the options present in segments, the ordering of options, the
specific behaviors in the case of various conditions, packet timing,
packet sizing, and other aspects of the protocol that are left to be
determined by an implementer, and can use those observations to
identify information about the host and implementation.

Vulnerability details: The TCP protocol in RFC 9293 has a timing side channel that makes it easier for remote attackers to infer the content of one TCP connection from a client system (to any server), when that client system is concurrently obtaining TCP data at a slow rate from an attacker-controlled server, aka the “SnailLoad” issue. For example, the attack can begin by measuring RTTs via the TCP segments whose role is to provide an ACK control bit and an Acknowledgment Number.

Official announcement: For detail, please refer to link – https://nvd.nist.gov/vuln/detail/CVE-2024-39920

CVE-2024-20081: Out-of-bounds write in gnss, response by Mediatek security advisory. (2nd July 2024)

Preface: GPS traditionally refers to the North American Global Positioning System, or satellite positioning system. GNSS is the term for the international multi-constellation satellite system. Therefore, GNSS typically includes GPS, GLONASS, Baidu, Galileo, and any other constellation system.

Background: GNSS positioning modules or chips, as the core component of In-vehicle Infotainment systems, provide position, speed, and time information. GNSS position and speed measurements are integral, especially with respect to moving map navigation.

GNSS are used in all forms of transportation: space stations, aviation, maritime, rail, road and mass transit. Positioning, navigation and timing (PNT) play a critical role in telecommunications, land surveying, law enforcement, emergency response, precision agriculture, mining, finance, scientific research…etc.

Vulnerability details: In gnss service, there is a possible escalation of privilege due to improper certificate validation. This could lead to remote escalation of privilege with no additional execution privileges needed. User interaction is not needed for exploitation. Patch ID: ALPS08720039; Issue ID: MSV-1424.

Official announcement: For detail, please refer to link –

https://corp.mediatek.com/product-security-bulletin/July-2024

About LoLLMS WebUI: CVE-2024-5443 design flaw related to CVE-2024-4320 (NVD Last Modified: 06/24/2024)

Preface: Large language models (LLM) are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities.

The key feature of a multimodal model is its ability to integrate and interpret information from these different data sources, often simultaneously. These can be understood as more advanced versions of large language models (LLMs) that can work not only on text but diverse data types.

Background:

1.Activate the environment

conda activate lollms

2.Install cudatoolkit

conda install -c anaconda cudatoolkit

3.Install lollms

pip install –upgrade lollms

4.Lord of Large Language Models (LoLLMs) are ready

Vulnerability details: CVE-2024-5443: CVE-2024-4320 describes a vulnerability in the parisneo/lollms software, specifically within the ExtensionBuilder().build_extension() function.
The vulnerability arises from the /mount_extension endpoint, where a path traversal issue allows attackers to navigate beyond the intended directory structure. This is facilitated by the data.category and data.folder parameters accepting empty strings (“”), which, due to inadequate input sanitization, can lead to the construction of a package_path that points to the root directory.
Consequently, if an attacker can create a config.yaml file in a controllable path, this path can be appended to the extensions list and trigger the execution of init.py in the current directory, leading to remote code execution. The vulnerability affects versions from 5.9.0, and has been addressed in version 9.5.1.

Official announcement: For detail, please refer to link – https://nvd.nist.gov/vuln/detail/CVE-2024-5443

CVE-2024-36532: Insecure permissions in kruise v1.6.2 (21 June 2024)

Preface: CNCF (Cloud Native Computing Foundation) is the open source, vendor-neutral hub of cloud native computing, hosting projects like Kubernetes and Prometheus to make cloud native universal and sustainable.

Background: OpenKruise is a suite of extension components for Kubernetes that focuses on automated management of large-scale applications, such as deployment, upgrades, maintenance, and availability protection. Most of the functionality provided by OpenKruise is primarily built on CRD extensions.

Vulnerability details: Insecure permissions in kruise v1.6.2 allows attackers to access sensitive data and escalate privileges by obtaining the service account’s token.

  1. the attacker stole the token.
    Here is an example of stealing a token:in cncf, there is a project named hwameistor, and the DaemonSet hwameistor-local-disk-manager for that project has a cluster role named hwameistor-admin, which has the update/patch verb of nodes resource.If a malicious user takes control of a worker node, by default the “hwameistor-local-disk-manager” pod will run on that node and he/she can use that pod to patch/update other nodes and force kruise’s pod to run on the malicious worker node. Then, he/she can stole the token.
  2. Use the obtained token information to authenticate with the API Server. By including the token in the request, attacker can be recognized as a legitimate user with the ServiceAccount and gain all privileges associated with the ServiceAccount.
  3. Use the privileges to access all Secrets in the cluster.
  4. Use the sensitive information in the Secrets to elevate privileges and explore other sensitive resources, and eventually take over the entire cluster.

Official announcement: For detail, please refer to link – https://www.tenable.com/cve/CVE-2024-36532