Artificial Intelligence technology development whether bring a battle for hegemony of compiler? (29th Dec 2023)

Preface: The competitors of LLVM such as GCC, Microsoft Visual C++, and Intel C++ Compiler. NVIDIA’s CUDA Compiler (NVCC) is based on the widely used LLVM open source compiler infrastructure. Furthermore, Tesla engineers wrote their own LLVM backed JIT neural compiler for Dojo.

Background: Instead of relying on computing power to function, GPUs rely on these numerous cores to pull data from memory, perform parallel calculations on it, and push the data back out for use. If you code something and compile it with a regular compiler, that’s not targeted for GPU execution, the code will always execute at the CPU. The GPU driver and compiler interact to ensure that the execution of the program on the GPU is correct operations. For example: You can compile CUDA codes for an architecture when your node hosts a GPU of different architecture.

A full build of LLVM and Clang will need around 15-20 GB of disk space. The exact space requirements will vary by system.

NVIDIA’s CUDA Compiler (NVCC) is based on the widely used LLVM open source compiler infrastructure. Developers can create or extend programming languages with support for GPU acceleration using the NVIDIA Compiler SDK.

Technical details: The LLVM is a low level register-based virtual machine. It is designed to abstract the underlying hardware and draw a clean line between a compiler back-end (machine code generation) and front-end (parsing, etc.). LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture.

Ref: LLVM Pass framework is an important component of LLVM infrastructure, and it performs code transformations and optimizations at LLVM IR level.

LLVM IR is the language used by the LLVM compiler for program analysis and transformation. It’s an intermediate step between the source code and machine code, serving as a kind of lingua franca that allows different languages to utilize the same optimization and code generation stages of the LLVM compiler.

Looking Ahead: But facing the prospect of cyber security, perhaps new compilers will join this battle in the future.

Processor technology perspective: Unified Memory with shared page tables (28th Dec 2023)

Preface: NVIDIA Ada Lovelace architecture GPUs are designed to deliver performance for professional graphics, video, AI and computing. The GPU is based on the Ada Lovelace architecture, which is different from the Hopper architecture used in the H100 GPU.

As of October 2022, NVLink is being phased out in NVIDIA’s new Ada Lovelace architecture. The GeForce RTX 4090 and the RTX 6000 Ada both do not support NVLink.

Background: The NVIDIA Grace Hopper Superchip pairs a power-efficient, high-bandwidth NVIDIA Grace CPU with a powerful NVIDIA H100 Hopper GPU using NVLink-C2C to maximize the capabilities for strong-scaling high-performance computing (HPC) and giant AI workloads.

NVLink-C2C is the enabler for Nvidia’s Grace-Hopper and Grace Superchip systems, with 900GB/s link between Grace and Hopper, or between two Grace chips.

Technical details: One of the major differences in many-core versus multicore architectures is the presence of two different memory spaces: a host space and a device space. In the case of NVIDIA GPUs, the device is supplied with data from the host via one of the multiple memory management API calls provided by the CUDA framework, such as CudaMallocManaged and CudaMemCpy. Modern systems, such as the Summit supercomputer, have the capability to avoid the use of CUDA calls for memory management and access the same data on GPU and CPU. This is done via the Address Translation Services (ATS) technology that gives a unified virtual address space for data allocated with malloc and new if there is an NVLink connection between the two memory spaces.

My comment: Since CUDA is proprietary parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). In normal circumstances, dynamic memory is allocated and released while the program is running, it may cause memory space fragmentation. Over time, this fragmentation can result in insufficient contiguous memory blocks for new allocations, resulting in memory allocation failures or unexpected behaviour. So, it’s hard to say that design limitations won’t arise in the future!

Reference: In CUDA, kernel code is written using the [code]global[/code] qualifier and is called from the host code to be executed on the GPU. In summary, [code]cudaMalloc[/code] is used in the host code to allocate memory on the GPU, while [code]malloc[/code] is used in the kernel code to allocate memory on the CPU.

CVE-2023-37188 Artificial Intelligence world versus tiny software components. Do not contempt a noncritical vulnerability! (27th December 2023)

Preface: Data science is an interdisciplinary field that combines statistical analysis, programming, and domain knowledge to extract valuable insights and make data-driven decisions.

Background: 2020 has been a year in which the Blosc program has received significant donations, totalling $55,000 to date. The most important tasks carried out between January 2020 and August 2020. Most of these tasks are related to the fastest projects under development: C-Blosc2 and Caterva (including its cat4py wrapper).

C-Blosc2 is the new major version of C-blosc, and it provides backward compatibility to both the C-Blosc1 API and its in-memory format.

C-Blosc2 adds new data containers, called superchunks, that are essentially a set of compressed chunks in memory that can be accessed randomly and enlarged during its lifetime.

Vulnerability details: C-blosc2 before 2.9.3 was discovered to contain a NULL pointer dereference via the function zfp_rate_decompress at zfp/blosc2-zfp[.]c.

My observation: On many platforms, dereferencing a null pointer results in abnormal program termination.

C-Blosc2 adds new data containers, called superchunks, that are essentially a set of compressed chunks in memory that can be accessed randomly and enlarged during its lifetime. The chunkdata pointer is later used as a destination argument in a call to memcpy(), resulting in user-defined data overwriting memory starting at address 0. It can be a potential risk example of a code execution exploit that resulted from a null pointer dereference.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-37188

Processor vendor ARM responds to research paper published on Dec 2023. (21st Dec 2023)

Preface: The use of previously freed memory can have any number of adverse consequences – ranging from the corruption of valid data to the execution of arbitrary code, depending on the instantiation and timing of the flaw. The simplest way data corruption may occur involves the system’s reuse of the freed memory. They are common coding problems that can lead to vulnerabilities and affect stability.

Background: Why MTE? Memory safety bugs, which are errors in handling memory in native programming languages, are common code issues. They lead to security vulnerabilities as well as stability problems.Armv9 introduced the Arm Memory Tagging Extension (MTE), a hardware extension that allows you to catch use-after-free and buffer-overflow bugs in your native code.

Technical details: In December 2023, a research paper called ‘Sticky Tags: Efficient and Deterministic Spatial Memory Error Mitigation using Persistent Memory Tags’ was published by academics from VUSec Group, Vrije Universiteit Amsterdam. The paper demonstrates how speculative probing can potentially be used to determine Arm Memory Tagging Extension (MTE) allocation tags and explores alternative solutions to Arm MTE.

Official announcement: Please refer to the link for details – https://developer.arm.com/Arm%20Security%20Center/Arm%20Memory%20Tagging%20Extension

CVE-2023-5869 postgresql: Buffer overrun from integer overflow in array modification (20th Dec 2023)

Preface: PostgreSQL allocates memory from the work_mem pool when a query requires sorting or hashing. If there is not enough memory available in the work_mem pool, PostgreSQL will spill to disk. temp_buffers controls the amount of memory allocated for temporary tables.

Does Postgres write to disk? To guard against unforeseen failures, PostgreSQL periodically writes full page images to permanent storage before modifying the actual page on disk. By doing this, during crash recovery PostgreSQL can restore partially-written pages.

Background: Declaring an array in PostgreSQL is straightforward. An array data type is defined by appending square brackets [] to any valid data type. This could be an array of integers, text, boolean values, or even more complex data types like composite types or other arrays.
Many databases support array fields of a scalar type. SQL allows ARRAY column types. In PostgreSQL INTEGER[5] represents an array of 5 integers.

Vulnerability details: A flaw was found in PostgreSQL that allows authenticated database users to execute arbitrary code through missing overflow checks during SQL array value modification. This issue exists due to an integer overflow during array modification where a remote user can trigger the overflow by providing specially crafted data. This enables the execution of arbitrary code on the target system, allowing users to write arbitrary bytes to memory and extensively read the server’s memory.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-5869

CVE-2023-28546: Buffer Copy Without Checking Size of Input in SPS Applications (19th Dec 2023)

Preface: But what is the significance of SPS keywords? Qualcomm didn’t mention it. Let’s trace if we can find what are the weak points of the design?

Background: The Qualcomm Secure Processing Unit is an isolated hardware security core implemented in the Snapdragon 8cx Gen 3 Mobile Compute Platform SoC. As such, this security core incorporates standalone ROM, RAM, CPU, cryptographic acceleration units, countermeasure sensors, one-time programmable memory, etc. Key generation, signing and verification utilizing RSA and ECC cryptosystems across a range of modes.

Ref: SPS can be a term related to encryption capabilities. It can be applied to UDSF. For example: Samsung SDS UDSF is a 3GPP standard based network function for 5G core network mainly to store call processing and session related unstructured information of network functions such as AMF, SMF, etc.

SPS encryption functions: Methods in this class can help admin to encrypt files been output from sps. For now it is only used to encypt and decrypt snapshots. This class requires the SPS database. This class inherits all functions from the spsDb class, so there is no need to initiate the spsDb container. This class is required to run a SPS app. This class needs to be initialized global level.

Vulnerability details: Memory Corruption in SPS Application while exporting public key in sorter TA.

Official announcement: Please refer to the link for details –

https://nvd.nist.gov/vuln/detail/CVE-2023-28546

https://docs.qualcomm.com/product/publicresources/securitybulletin/december-2023-bulletin.html

Don’t underestimate the impact of today’s open-source software development! (18th Dec 2023)

Preface: In ten years ago, if you talk to people that your product software development use opensource products. Most likely cyber security expert will query your decision. But the trend of open-source software products usage seems change. The truth is a lot of open-source products alliances with enterprise computer vendor. So, the patch will deliver quickly when vulnerability found. As a matter of fact, in the world no software can avoid vulnerability occur. Furthermore, since open-source less portion bother by business decision. So it similar a technology booster driven the technology running more faster.

Background: In essence, a neural network accepts inputs , does some processing and produces outputs. This input-process-output mechanism is called neural network feed-forward. Understanding the feed-forward mechanism is required. To create a neural network that solves difficult practical problems such as facial recognition or voice identification.

PyTorch provides the elegantly designed modules and classes, including torch[.]nn, to help you create and train neural networks. An nn[.]Module contains layers, and a method forward(input) that returns the output.

Today’s market trends: According to news article published on Nov 2019. For autopilot, Tesla trains around 48 networks that do 1,000 different predictions and it takes 70,000 GPU hours. Moreover, this training is not a one-time affair but an iterative one and all these workflows should be automated while making sure that these 1,000 different predictions don’t regress over time.

PyTorch, especially has become the go-to framework for machine learning researchers. It is fast and efficient, allowing users to quickly iterate on experiments and build models. PyTorch supports both CUDA and OpenCL, making it easy to take advantage of powerful GPUs for faster training.

There is no doubt about the future development of artificial intelligence, so the demand for GPUs goes hand in hand with autonomous driving.

CVE-2023-4622: It should patch by processor vendor or SUSE? (14th Dec 2023)

Preface: Unix domain sockets and network sockets have different security characteristics. In general, Unix domain sockets are considered to be more secure than network sockets, as they are not exposed to the network and are only accessible to processes on the same machine.

Background: A Unix domain socket aka UDS or IPC socket (inter-process communication socket) is a data communications endpoint for exchanging data between processes executing on the same host operating system. It is also referred to by its address family AF_UNIX .

DOCA Socket Relay allows Unix Domain Socket (AF_UNIX family) server applications to be offloaded to the DPU while communication between the two sides is proxied by DOCA Comm Channel.

Vulnerability details: A use-after-free vulnerability in the Linux kernel’s af_unix component can be exploited to achieve local privilege escalation. The unix_stream_sendpage() function tries to add data to the last skb in the peer’s recv queue without locking the queue. Thus there is a race where unix_stream_sendpage() could access an skb locklessly that is being released by garbage collection, resulting in use-after-free.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-4622

About CVE-2023-40078: The OPUS a2dp on the Android platform has a design flaw that may lead paired device escalation of privilege (14th Dec 2023)

Preface: A2DP is a protocol supported on most Bluetooth Audio devices. Opus is open source , OPUS a2dp being introduced in Android 13.

Background: In Bluetooth, there is a possibility of code-execution due to a use after free. This could lead to paired device escalation of privilege in the privileged Bluetooth process with no additional execution privileges needed. User interaction is not needed for exploitation. Such design weakness published on 30th Oct, 2023. The CVE reference is CVE-2023-21361.

The advantages of using C++ for Android app development is its ability to create cross-platform apps. By writing platform-agnostic code in C++, you can reuse it for developing iOS apps using tools like Apple’s Xcode and Swift. This allows for efficient code sharing between Android and iOS platforms.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-40078

CVE-2023-42914 – An app may be able to break out of its sandbox (13th Dec 2023)

Preface: One action Apple has taken over the past few years is to harden the Safari WebContent (or “renderer”) process sandbox attack surface on iOS, most recently by removing the ability for WebContent to be exploited directly to the GPU process.

Background: App Sandbox provides protection to system resources and user data by limiting your app’s access to resources requested through entitlements.

Essentials – App Sandbox Entitlement

A Boolean value that indicates whether the app may use access control technology to contain damage to the system and user data if an app is compromised.

Key: com[.]apple[.]security[.]app-sandbox

Vulnerability details: An app may be able to break out of its sandbox. The issue was addressed with improved memory handling.

Impact: iPhone 8 and later, iPad Pro (all models), iPad Air 3rd generation and later, iPad 5th generation and later, and iPad mini 5th generation and later.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-42914