Preface: The title states that Buffer does not check the size of the input in Core when copying. But I believe it is more important to avoid unauthorized copying.
Background: An OpenCL Buffer is a 1D or 2D or 3D array in global memory. Its an abstract object that can be addressed thru a pointer. Buffers are Read-Only or Write_only or Read-Write. An Image buffer represents GPU Texture memory. It represents an array of pixels that can be access via functions specifying pixel x,y,z coordinates. There is no pointer access to Image Pixels on the GPU.
OpenCL supports buffer and image objects (and pipe objects from OpenCL 2.0). The one-dimensional buffer objects are a natural choice for many developers due to their simplicity and flexibility, such as the support of pointers, byte-addressable access, etc. For instance, using images allows hardware to handle out-of-boundaries read automatically.
Ref: An OpenCL Buffer is a 1D or 2D or 3D array in global memory. Its an abstract object that can be addressed thru a pointer. Buffers are Read-Only or Write_only or Read-Write. An Image buffer represents GPU Texture memory. It represents an array of pixels that can be access via functions specifying pixel x,y,z coordinates. There is no pointer access to Image Pixels on the GPU.
Vulnerability details: The product performs operations on a memory buffer, but it can read from or write to a memory location that is outside of the intended boundary of the buffer.
Preface: Different HPC compilers allow the coder to make use of these tools for better performance and capability. For example, using HPC compilers allow for easier coding to run a parallel job. Compilation is the process of converting C language source code into executable program code. Running is the process of executing executable code. Compilation only needs to be completed once to produce executable code. The resulting executable code can be run multiple times.
Background: XML parser for C++ determines whether an XML document is well-formed and optionally validates it against a DTD. A DTD is a Document Type Definition. A DTD defines the structure and the legal elements and attributes of an XML document. The parser constructs an object tree that can be accessed through a DOM interface or operates serially through a SAX interface. SAX defines an abstract programmatic interface that models the XML information set (infoset) through a linear sequence of familiar method calls.
Validating an XML document determines whether the structure and content of the document conform to a set of rules. Xerces-C++ is a validating XML parser written in a portable subset of C++.
Remark: SAX-type parsing performance of Fast Infoset is also much faster than parsing performance of XML 1.0.
Vulnerability details: A use-after-free vulnerability was found in xerces-c in the way an XML document is processed via the SAX API. Applications that process XML documents with an external Document Type Definition (DTD) may be vulnerable to this flaw. A remote attacker could exploit this flaw by creating a specially crafted XML file that would crash the application or potentially lead to arbitrary code execution.
Ref: To understand the twists and turns of this story, please refer to the pictures attached to this article.
Preface: The GNU C Library – The project provides the core libraries for the GNU system and GNU/Linux systems. GLib is a platform library which is used by many hundreds of projects outside of GNOME.
Background: R is a language and environment for statistical programming which includes statistical computing and graphics. Python is a general-purpose programming language for data analysis and scientific computing. It is essential to know programming languages like R and Python in order to implement the whole Machine Learning process. Python and R both provide in-built libraries that make it very easy to implement Machine Learning algorithms.
What is glibc in Python? glibc provides a complete implementation of the ISO C standard library, which includes functions for file I/O, string manipulation, memory allocation, and more. This makes it easy for us to write portable and efficient code that can run on different systems.
Vulnerability details: An integer overflow was found in the __vsyslog_internal function of the glibc library. This function is called by the syslog and vsyslog functions. This issue occurs when these functions are called with a very long message, leading to an incorrect calculation of the buffer size to store the message, resulting in undefined behavior. This issue affects glibc 2.37 and newer.
This announcement was originally published on January 22nd 2024
Preface: Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms.
Recent advances in artificial intelligence systems, such as voice or facial recognition programs, have benefited from neural networks, densely interconnected meshes of simple information processors that learn to perform tasks by analyzing large amounts of training data.
Background: The Apple Neural Engine (or ANE) is a type of NPU, which stands for Neural Processing Unit. It’s like a GPU, but instead of accelerating graphics an NPU accelerates neural network operations such as convolutions and matrix multiplies.
Beyond image generation from text prompts, developers are also discovering other creative uses for Stable Diffusion, such as image editing, in-painting, out-painting, super-resolution, style transfer and even color palette generation. Getting to a compelling result with Stable Diffusion can require a lot of time and iteration, so a core challenge with on-device deployment of the model is making sure it can generate results fast enough on device. As a result, we require the Apple Neural Engine.
Vulnerability details: Apple security advisory shown that the vulnerability belongs to Apple Neural Engine.
Impact: An app may be able to execute arbitrary code with kernel privileges
Description: The issue was addressed with improved memory handling.
Preface: In ancient time of South America Tribal leaders would cover their bodies with gold powder and wash themselves in a holy lake in the mountains. For example, the famous place for ancient civilization execute this ceremony is Lake Titicaca. Priests and nobles would throw precious gold and emeralds into the lake dedicated to God.
El Dorado, so called the Golden Kingdom is an ancient legend that first began with a South American ritual. Spanish Conquistadors, upon hearing these tales from the natives, believed there was a place abundant in gold and precious stones and began referring to it as El Dorado. Many explorers believe that Ciudad Blanca is the legendary El Dorado. Legend has it that somewhere beneath the forest canopy lies the ancient city of Ciudad Blanca and now archaeologists think they may have found it.
A group of scientists from fields including archaeology, anthropology and geology using new technology known as airborne light detection and ranging (LiDAR). They found what appears to be a network of plazas and pyramids, hidden for hundreds of years in the underneath of the forest.
Background: What is LiDAR? LiDAR (light detection and ranging) is a remote sensing method that uses a laser to measure distances. Pulses of light are emitted from a laser scanner, and when the pulse hits a target, a portion of its photons are reflected back to the scanner. Because the location of the scanner, the directionality of the pulse, and the time between pulse emission and return are known, the 3D location (XYZ coordinates) from which the pulse reflected is calculable.
Which software is used for LiDAR data processing?
While LiDAR is a technology for making point clouds, not all point clouds are created using LiDAR. For example, point clouds can be made from images obtained from digital cameras, a technique known as photogrammetry. The one difference to remember that distinguishes photogrammetry from LiDAR is RGB. Unlike the RGB image, the LIDAR projection image does not have obvious texture, and it is difficult to find patterns in the projected image.
The programs to process LiDAR are numerous and increasing rapidly in accordance with the evolving field and user needs. ArcGIS has LiDAR processing functionality. ArcGIS accepts LAS or ASCII file types and has both 2D and 3D visualization options. Additionally, there are other options on the market. For example: NVIDIA DeepStream Software Development Kit (SDK). This SDK is an accelerated AI framework to build pipelines. DeepStream pipelines enable real-time analytics on video, image, and sensor data.
The architecture diagram on the right is for reference.
Preface: Artificial intelligence performs better when humans are involved in data collection, annotation, and validation. But why is artificial intelligence ubiquitous in the human world? Can we limit the use of AI?
Background: The NVIDIA DGX A100 system comes with a baseboard management controller (BMC) for monitoring and controlling various hardware devices on the system. It monitors system sensors and other parameters. Kernel-based Virtual Machine (KVM) is an open source virtualization technology built into Linux. KVM lets you turn Linux into a hypervisor that allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs).
What is Virtio-net device? Virtio-net device emulation enables users to create VirtIO-net emulated PCIe devices in the system where the NVIDIA® BlueField® DPU is connected.
Vulnerability details:
CVE-2023-31029 – NVIDIA DGX A100 baseboard management controller (BMC) contains a vulnerability in the host KVM daemon, where an unauthenticated attacker may cause a stack overflow by sending a specially crafted network packet. A successful exploit of this vulnerability may lead to arbitrary code execution, denial of service, information disclosure, and data tampering.
CVE-2023-31030 – NVIDIA DGX A100 BMC contains a vulnerability in the host KVM daemon, where an unauthenticated attacker may cause a stack overflow by sending a specially crafted network packet. A successful exploit of this vulnerability may lead to arbitrary code execution, denial of service, information disclosure, and data tampering.
My comment: The vendor published this vulnerability but did not provide full details. Do you think whether the details in attached diagram is the actual reason?
Preface: When High performance cluster (HPC) was born, it destiny do a competition with traditional mainframe technology. The major component of HPC is the multicore processor. That is GPU. For example: The NVIDIA GA100 GPU is composed of multiple GPU Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), and HBM2 memory controllers. Compare with the best of the best setup, the world’s fastest public supercomputer, Frontier, has 37,000 AMD Instinct 250X GPUs.
How to break through traditional computer technology and go from serial to parallel processing: CPUs are fast, but they work by quickly executing a series of tasks, which requires a lot of interactivity. This is known as serial processing. GPU parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. As time goes by. Until the revolution in GPU processor technology and high-performance clusters. RedHat created a High Performance Cluster system configuration. The overall performance is close to that of a supercomputer processor using crossbar switches. But the bottleneck lies in how to transform traditional software applications from serial processing to parallel processing.
Reflection of reality in the technological world: A common consense that GPU processor manufacturer Nvidia had strong market share in the world. The Nvidia A100 processor delivers strong performance on intensive AI tasks and deep learning. A more budget-friendly option, the H100 can be preferred for graphics-intensive tasks. The H100’s optimizations, such as TensorRT-LLM and NVLink, show that it surpasses the A100, especially in the LLM area. Large Language Models (LLMs) have revolutionised the field of natural language processing. As these models grow in size and complexity, the computational demands for inference also increase significantly. To tackle this challenge, leveraging multiple GPUs becomes essential.
Supply constraints and product attribute design create headaches for web hosting providers: CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). But converting serial C code to data parallel code is a difficult problem. Because of this limitation. Nvidia develop NVIDIA CUDA Compiler (NVCC). This software is a proprietary compiler by Nvidia intended for use with CUDA.
Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python.
But you cannot use CUDA without a Nvidia Graphics Card. CUDA is a framework developed by Nvidia that allows people with a Nvidia Graphics Card to use GPU acceleration when it comes to deep learning, and not having a Nvidia graphics card defeats that purpose. (Refer to attached Diagram Part 1).
If web hosting service provider not use NVIDIA product, is it possible to use other brand GPU processor for AI machine learning? Yes, you can select OpenCilk.
OpenCilk (http://opencilk.org) is a new open-source platform to support task-parallel programming in C/C++. (Refer to attached Diagram Part 2)
Referring to the above details, the technological development atmosphere makes people foresee that two camps will operate in the future. This is the Nvidia camp and the non-Nvidia camp. This is why I have observed that web hosting service providers are giving themselves headaches in this new trend in technology gaming.
Preface: The competitors of LLVM such as GCC, Microsoft Visual C++, and Intel C++ Compiler. NVIDIA’s CUDA Compiler (NVCC) is based on the widely used LLVM open source compiler infrastructure. Furthermore, Tesla engineers wrote their own LLVM backed JIT neural compiler for Dojo.
Background: Instead of relying on computing power to function, GPUs rely on these numerous cores to pull data from memory, perform parallel calculations on it, and push the data back out for use. If you code something and compile it with a regular compiler, that’s not targeted for GPU execution, the code will always execute at the CPU. The GPU driver and compiler interact to ensure that the execution of the program on the GPU is correct operations. For example: You can compile CUDA codes for an architecture when your node hosts a GPU of different architecture.
A full build of LLVM and Clang will need around 15-20 GB of disk space. The exact space requirements will vary by system.
NVIDIA’s CUDA Compiler (NVCC) is based on the widely used LLVM open source compiler infrastructure. Developers can create or extend programming languages with support for GPU acceleration using the NVIDIA Compiler SDK.
Technical details: The LLVM is a low level register-based virtual machine. It is designed to abstract the underlying hardware and draw a clean line between a compiler back-end (machine code generation) and front-end (parsing, etc.). LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture.
Ref: LLVM Pass framework is an important component of LLVM infrastructure, and it performs code transformations and optimizations at LLVM IR level.
LLVM IR is the language used by the LLVM compiler for program analysis and transformation. It’s an intermediate step between the source code and machine code, serving as a kind of lingua franca that allows different languages to utilize the same optimization and code generation stages of the LLVM compiler.
Looking Ahead: But facing the prospect of cyber security, perhaps new compilers will join this battle in the future.
Preface: NVIDIA Ada Lovelace architecture GPUs are designed to deliver performance for professional graphics, video, AI and computing. The GPU is based on the Ada Lovelace architecture, which is different from the Hopper architecture used in the H100 GPU.
As of October 2022, NVLink is being phased out in NVIDIA’s new Ada Lovelace architecture. The GeForce RTX 4090 and the RTX 6000 Ada both do not support NVLink.
Background: The NVIDIA Grace Hopper Superchip pairs a power-efficient, high-bandwidth NVIDIA Grace CPU with a powerful NVIDIA H100 Hopper GPU using NVLink-C2C to maximize the capabilities for strong-scaling high-performance computing (HPC) and giant AI workloads.
NVLink-C2C is the enabler for Nvidia’s Grace-Hopper and Grace Superchip systems, with 900GB/s link between Grace and Hopper, or between two Grace chips.
Technical details: One of the major differences in many-core versus multicore architectures is the presence of two different memory spaces: a host space and a device space. In the case of NVIDIA GPUs, the device is supplied with data from the host via one of the multiple memory management API calls provided by the CUDA framework, such as CudaMallocManaged and CudaMemCpy. Modern systems, such as the Summit supercomputer, have the capability to avoid the use of CUDA calls for memory management and access the same data on GPU and CPU. This is done via the Address Translation Services (ATS) technology that gives a unified virtual address space for data allocated with malloc and new if there is an NVLink connection between the two memory spaces.
My comment: Since CUDA is proprietary parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). In normal circumstances, dynamic memory is allocated and released while the program is running, it may cause memory space fragmentation. Over time, this fragmentation can result in insufficient contiguous memory blocks for new allocations, resulting in memory allocation failures or unexpected behaviour. So, it’s hard to say that design limitations won’t arise in the future!
Reference: In CUDA, kernel code is written using the [code]global[/code] qualifier and is called from the host code to be executed on the GPU. In summary, [code]cudaMalloc[/code] is used in the host code to allocate memory on the GPU, while [code]malloc[/code] is used in the kernel code to allocate memory on the CPU.
Preface: Data science is an interdisciplinary field that combines statistical analysis, programming, and domain knowledge to extract valuable insights and make data-driven decisions.
Background: 2020 has been a year in which the Blosc program has received significant donations, totalling $55,000 to date. The most important tasks carried out between January 2020 and August 2020. Most of these tasks are related to the fastest projects under development: C-Blosc2 and Caterva (including its cat4py wrapper).
C-Blosc2 is the new major version of C-blosc, and it provides backward compatibility to both the C-Blosc1 API and its in-memory format.
C-Blosc2 adds new data containers, called superchunks, that are essentially a set of compressed chunks in memory that can be accessed randomly and enlarged during its lifetime.
Vulnerability details: C-blosc2 before 2.9.3 was discovered to contain a NULL pointer dereference via the function zfp_rate_decompress at zfp/blosc2-zfp[.]c.
My observation: On many platforms, dereferencing a null pointer results in abnormal program termination.
C-Blosc2 adds new data containers, called superchunks, that are essentially a set of compressed chunks in memory that can be accessed randomly and enlarged during its lifetime. The chunkdata pointer is later used as a destination argument in a call to memcpy(), resulting in user-defined data overwriting memory starting at address 0. It can be a potential risk example of a code execution exploit that resulted from a null pointer dereference.