CVE‑2024-0110: Supercomputer and AI development Interlude (II)  (30th Aug 2024)

Preface: OpenAI revealed that the project cost $100 million, took 100 days, and used 25,000 NVIDIA A100 GPUs. Each server equipped with these GPUs uses approximately 6.5 kW, so an estimated 50 GWh of energy is consumed during training.

Background: Parallel processing is a method in computing of running two or more processors (CPUs) to handle separate parts of an overall task. Breaking up different parts of a task among multiple processors will help reduce the amount of time to run a program. GPUs render images more quickly than a CPU because of its parallel processing architecture, which allows it to perform multiple calculations across streams of data simultaneously. The CPU is the brain of the operation, responsible for giving instructions to the rest of the system, including the GPU(s).

NVIDIA CUDA provides a simple C/C++ based interface. The CUDA compiler leverages parallelism built into the CUDA programming model as it compiles your program into code.
CUDA is a parallel computing platform and programming interface model created by Nvidia for the development of software which is used by parallel processors. It serves as an alternative to running simulations on traditional CPUs.

Vulnerability details:

CVE-2024-0110: NVIDIA CUDA Toolkit contains a vulnerability in command `cuobjdump` where a user may cause an out-of-bound write by passing in a malformed ELF file. A successful exploit of this vulnerability may lead to code execution or denial of service.

CWE‑787   Code execution, denial of service (Severity –  Medium)

Ref: The integer overflow may result in an out-of-bounds write.

Official announcement: Please refer to the vendor announcement for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5564

CVE-2024-34731: last week CVEs, today story. (29th Aug 2024)

Preface: A race condition vulnerability is a software bug that allows these unexpected results to be exploited by malicious entities.

The Race condition is a privilege escalation vulnerability that manipulates the time between imposing a security control and using services in a UNIX like system. This vulnerability is a result of interferences caused by multiple sequential threads running in the system and sharing the same resources.

Background: TranscodingResourcePolicy is a component of the Android platform/frameworks/av package that manages resource policies for transcoding operations. Transcoding is the process of converting media files from one format to another.

Vulnerability details: In multiple functions of TranscodingResourcePolicy.cpp, there is a possible memory corruption due to a race condition. This could lead to local escalation of privilege with no additional execution privileges needed. User interaction is not needed for exploitation.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2024-34731

CVE-2024-8105: let you aware that computer industry not secure! (2024-08-26)

Preface: The key leaked in 2023, it was one of the test dummy keys that AMI was shipping since as early as May 2012, said security firm.

The so-called Platform Key (PK) from American Megatrends International (AMI) serves as the root of trust during the Secure Boot PC startup chain, and verifies the authenticity and integrity of a device’s firmware and boot software.

Background: The so-called Platform Key (PK) from American Megatrends International (AMI) serves as the root of trust during the Secure Boot PC startup chain, and verifies the authenticity and integrity of a device’s firmware and boot software.

Vulnerability details: A vulnerability related to the use an insecure Platform Key (PK) has been discovered. An attacker with the compromised PK private key can create malicious UEFI software that is signed with a trusted key that has been compromised.

Official announcement: Please refer to the link for details – https://www.tenable.com/cve/CVE-2024-8105

https://www.supermicro.com/en/support/security_PKFAIL_Jul_2024

CVE-2024-44932: idpf: fix UAFs when destroying the queues (26th Aug 2024)

Preface: XDP, or eXpress Data Path, is a Linux networking feature that enables you to create high-performance packet-processing programs that run in the kernel

Background: idpf Linux Base Driver supports XDP (Express Data Path) and AF_XDP zero-copy. Note that XDP is blocked for frame sizes larger than 3KB. The idpf driver serves as both the Physical Function (PF) and Virtual Function (VF) driver for the Infrastructure Data-Plane Function.
This driver is only supported as a loadable module at this time. Intel
is not supplying patches against the kernel source to allow for static
linking of the drivers.

Vulnerability details: The second tagged commit started sometimes (very rarely, but possible) throwing WARNs from net/core/page_pool.c:page_pool_disable_direct_recycling(). Turned out idpf frees interrupt vectors with embedded NAPIs before freeing the queues making page_pools’ NAPI pointers lead to freed memory before these pools are destroyed by libeth. It’s not clear whether there are other accesses to the freed vectors when destroying the queues, but anyway, we usually free queue/interrupt vectors only when the queues are destroyed and the NAPIs are guaranteed to not be referenced anywhere.
Invert the allocation and freeing logic making queue/interrupt vectors be allocated first and freed last. Vectors don’t require queues to be present, so this is safe. Additionally, this change allows to remove that useless queue->q_vector pointer cleanup, as vectors are still valid when freeing the queues (+ both are freed within one function, so it’s not clear why nullify the pointers at all).

Official announcement: Please refer to the link for details –
https://nvd.nist.gov/vuln/detail/CVE-2024-44932

CVE-2024-42340 –> CWE-602: Client-Side Enforcement of Server-Side Security (26th Aug 2024)

Preface: CyberArk Identity creates a set of JavaScript objects, global variables, and global methods for each SAML user session. These objects provide information that a user map script or a custom SAML script can read and act on.


Background: Application access policies with JavaScript – If you want more specific control over when users can access your application or when they are required to provide additional authentication credentials, you can use JavaScript.
If you use a policy script, authentication rules configured in the UI will be ignored.


Vulnerability details: CyberArk – CWE-602: Client-Side Enforcement of Server-Side Security


Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2024-42340

CVE-2024-45163: The 1st time seen CVE -related to malware vulnerability. (22nd Aug 2024)

Preface: Mirai is malware that turns networked devices running Linux into remotely controlled bots that can be used as part of a botnet in large-scale network attacks. It primarily targets online consumer devices such as IP cameras and home routers.

Background: The Mirai botnet connects to the CNC (command and control) server via simultaneous TCP. An unauthenticated session remains open, allowing an attacker, for example, to send a recognizable username (such as root), or to send arbitrary data.

Vulnerability details: The Mirai botnet through 2024-08-19 mishandles simultaneous TCP connections to the CNC (command and control) server. Unauthenticated sessions remain open, causing resource consumption. But the CNC server cannot adequately manage these connections, leading to resource exhaustion and server crashes.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2024-45163

CVE-2023-31315: AMD SMM Lock Bypass (21-Aug-2024)

Preface: AMD EPYC™ Processors power the highest-performing x86 servers for the modern data center, on prem and in cloud environments, across industries.

Background: Model-specific registers (MSR) are control registers provided by the processor implementation so that system software can interact with a variety of features, including performance monitoring, checking processor status, debugging, program tracing or toggling specific CPU features.

Intel and AMD may use the same MSR for the same feature, such as the IA32_LSTAR MSR register.

When it came to the Intel Pentium processor, Intel officially introduced two instructions, RDMSR and WRMSR, for reading and writing the MSR temporary register. At this time, MSR was officially introduced. When the RDMSR and WRMSR instructions were introduced, the CPUID instruction was also introduced. This instruction is used to indicate which functions are available in a specific CPU chip, or whether the MSR registers corresponding to these functions exist. The software can query a certain function through the CPUID instruction. Whether these functions are supported on the current CPU.

Vulnerability details: Improper validation in a model specific register (MSR) could allow a malicious program with ring0 access to modify SMM configuration while SMI lock is enabled, potentially leading to arbitrary code execution.

Ref: Researchers from IOActive have reported that it may be possible for an attacker with ring 0 access to modify the configuration of System Management Mode (SMM) even when SMM Lock is enabled.

Official announcement: Please refer to the link for details – https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7014.html

CVE-2023-52910 – iommu/iova: Fix alloc iova overflows issue (21-08-2024)

Preface: Modern hardware provides an I/O memory management unit (IOMMU) that mediates direct memory accesses (DMAs) by I/O devices in the same way that a processor’s MMU mediates memory accesses by instructions.

Background: With IOMMU, when the device performs DMA access to memory, the system returns to the device driver no longer a physical address, but a virtual address. This address is generally called IOVA. When the device accesses memory, IOMMU converts this virtual address into a physical address. But when iommu bypass is used, the device can also directly use the physical address for DMA.

Vulnerability details: This issue occurs in the following two situations

-The first iova size exceeds the domain size. When initializing iova domain, iovad->cached_node is assigned as iovad->anchor. For example, the iova domain size is 10M, start_pfn is 0x1_F000_0000, and the iova size allocated for the first time is 11M.

-The node with the largest iova->pfn_lo value in the iova domain is deleted, iovad->cached_node will be updated to iovad->anchor, and then the alloc iova size exceeds the maximum iova size that can be allocated in the domain.

Official announcement: Please refer to the url for details – https://nvd.nist.gov/vuln/detail/CVE-2023-52910

CVE-2024-44070: FRRouting (FRR) – bgpd – ensure the hash works  (18th Aug 2024)

Preface: As Time Goes By , OSS (Open Source Software) for use by cost-conscious commercial companies. It is quite popular in cloud.

Background: FRRouting (FRR) is a free and open source Internet routing protocol suite for Linux and Unix platforms. It implements BGP, OSPF, RIP, IS-IS, PIM, LDP, BFD, Babel, PBR, OpenFabric and VRRP, with alpha support for EIGRP and NHRP.

FRR’s seamless integration with native Linux/Unix IP networking stacks makes it a general purpose routing stack applicable to a wide variety of use cases including connecting hosts/VMs/containers to the network, advertising network services, LAN switching and routing, Internet access routers, and Internet peering.

Vulnerability details: An issue was discovered in FRRouting (FRR) through 10.1. bgp_attr_encap in bgpd/bgp_attr.c does not check the actual remaining stream length before taking the TLV value.

Official announcement: For details, please refer to link – https://www.tenable.com/cve/CVE-2024-44070

CVE-2024-43855: md/raid5 – recheck if reshape has finished with device_lock held. From technical point of view, it also impact RedHat cluster. (18 Aug 2024)

Preface: LVM version 2, or LVM2, is the default for Red Hat Enterprise Linux, which uses the device mapper driver contained in the 2.6 kernel. LVM2, which is almost completely compatible with the earlier LVM1 version, can be upgraded from versions of Red Hat Enterprise Linux running the 2.4 kernel.

The Clustered Logical Volume Manager (CLVM) is a set of clustering extensions to LVM. These extensions allow a cluster of computers to manage shared storage (for example, on a SAN) using LVM.

Background: In the Mutex concept, when the thread is trying to lock or acquire the Mutex which is not available then that thread will go to sleep until that Mutex is available. Whereas in Spinlock it is different. The spinlock is a very simple single-holder lock. If a process attempts to acquire a spinlock and it is unavailable, the process will keep trying (spinning) until it can acquire the lock. This simplicity creates a small and fast lock.

Vulnerability details: Deadlock occurs when mddev is being suspended while some flush bio is in progress. It is a complex issue.

T1. the first flush is at the ending stage, it clears ‘mddev->flush_bio’ and tries to submit data, but is blocked because mddev is suspended by T4.

T2. the second flush sets ‘mddev->flush_bio’, and attempts to queue md_submit_flush_data(), which is already running (T1) and won’t execute again if on the same CPU as T1.

T3. the third flush inc active_io and tries to flush, but is blocked because ‘mddev->flush_bio’ is not NULL (set by T2).

T4. mddev_suspend() is called and waits for active_io dec to 0 which is inc by T3.

The root issue is non-atomic inc/dec of active_io during flush process.

Official announcement: For details, please refer to link –

https://nvd.nist.gov/vuln/detail/CVE-2024-43855