Category Archives: AI and ML

CVE-2024-23212: Apple Neural Engine design has weakness in memory handling. (25th January 2024)

This announcement was originally published on January 22nd 2024

Preface: Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms.

Recent advances in artificial intelligence systems, such as voice or facial recognition programs, have benefited from neural networks, densely interconnected meshes of simple information processors that learn to perform tasks by analyzing large amounts of training data.

Background: The Apple Neural Engine (or ANE) is a type of NPU, which stands for Neural Processing Unit. It’s like a GPU, but instead of accelerating graphics an NPU accelerates neural network operations such as convolutions and matrix multiplies.

Beyond image generation from text prompts, developers are also discovering other creative uses for Stable Diffusion, such as image editing, in-painting, out-painting, super-resolution, style transfer and even color palette generation.  Getting to a compelling result with Stable Diffusion can require a lot of time and iteration, so a core challenge with on-device deployment of the model is making sure it can generate results fast enough on device. As a result, we require the Apple Neural Engine.

Vulnerability details: Apple security advisory shown that the vulnerability belongs to Apple Neural Engine.

Impact: An app may be able to execute arbitrary code with kernel privileges

Description: The issue was addressed with improved memory handling.

Official announcement: Please refer to the link for details – https://support.apple.com/en-us/HT214059

LiDAR assists archaeologist discovered ruins found in upper Amazon rainforest (15th Jan 2024)

Preface: In ancient time of South America Tribal leaders would cover their bodies with gold powder and wash themselves in a holy lake in the mountains. For example, the famous place for ancient civilization execute this ceremony is Lake Titicaca. Priests and nobles would throw precious gold and emeralds into the lake dedicated to God.

El Dorado, so called the Golden Kingdom is an ancient legend that first began with a South American ritual. Spanish Conquistadors, upon hearing these tales from the natives, believed there was a place abundant in gold and precious stones and began referring to it as El Dorado. Many explorers believe that Ciudad Blanca is the legendary El Dorado. Legend has it that somewhere beneath the forest canopy lies the ancient city of Ciudad Blanca and now archaeologists think they may have found it.

A group of scientists from fields including archaeology, anthropology and geology  using new technology known as airborne light detection and ranging (LiDAR). They found what appears to be a network of plazas and pyramids, hidden for hundreds of years in the underneath of the forest.

Background: What is LiDAR? LiDAR (light detection and ranging) is a remote sensing method that uses a laser to measure distances. Pulses of light are emitted from a laser scanner, and when the pulse hits a target, a portion of its photons are reflected back to the scanner. Because the location of the scanner, the directionality of the pulse, and the time between pulse emission and return are known, the 3D location (XYZ coordinates) from which the pulse reflected is calculable.

Which software is used for LiDAR data processing?

While LiDAR is a technology for making point clouds, not all point clouds are created using LiDAR. For example, point clouds can be made from images obtained from digital cameras, a technique known as photogrammetry. The one difference to remember that distinguishes photogrammetry from LiDAR is RGB. Unlike the RGB image, the LIDAR projection image does not have obvious texture, and it is difficult to find patterns in the projected image.

The programs to process LiDAR are numerous and increasing rapidly in accordance with the evolving field and user needs. ArcGIS has LiDAR processing functionality. ArcGIS accepts LAS or ASCII file types and has both 2D and 3D visualization options. Additionally, there are other options on the market. For example: NVIDIA DeepStream Software Development Kit (SDK). This SDK is an accelerated AI framework to build pipelines. DeepStream pipelines enable real-time analytics on video, image, and sensor data.

The architecture diagram on the right is for reference.

Headline News: https://www.sciencenews.org/article/ancient-urban-complex-ecuador-amazon-laser

About NVIDIA Security Bulletin – CVE-2023-31029 and CVE-2023-31030 (14th Jan 2024)

Preface: Artificial intelligence performs better when humans are involved in data collection, annotation, and validation. But why is artificial intelligence ubiquitous in the human world? Can we limit the use of AI?

Background: The NVIDIA DGX A100 system comes with a baseboard management controller (BMC) for monitoring and controlling various hardware devices on the system. It monitors system sensors and other parameters. Kernel-based Virtual Machine (KVM) is an open source virtualization technology built into Linux. KVM lets you turn Linux into a hypervisor that allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs).

What is Virtio-net device? Virtio-net device emulation enables users to create VirtIO-net emulated PCIe devices in the system where the NVIDIA® BlueField® DPU is connected.

Vulnerability details:

CVE-2023-31029 – NVIDIA DGX A100 baseboard management controller (BMC) contains a vulnerability in the host KVM daemon, where an unauthenticated attacker may cause a stack overflow by sending a specially crafted network packet. A successful exploit of this vulnerability may lead to arbitrary code execution, denial of service, information disclosure, and data tampering.

CVE-2023-31030 – NVIDIA DGX A100 BMC contains a vulnerability in the host KVM daemon, where an unauthenticated attacker may cause a stack overflow by sending a specially crafted network packet. A successful exploit of this vulnerability may lead to arbitrary code execution, denial of service, information disclosure, and data tampering.

Official announcement: Please refer to the link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5510

My comment: The vendor published this vulnerability but did not provide full details. Do you think whether the details in attached diagram is the actual reason?

Supply constraints and product attribute design. It is expected that two camps will be operated in the future. (9th JAN 2024)

Preface: When High performance cluster (HPC) was born, it destiny do a competition with traditional mainframe technology.  The major component of HPC is the multicore processor. That is GPU. For example: The NVIDIA GA100 GPU is composed of multiple GPU Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), and HBM2 memory controllers. Compare with the best of the best setup,  the world’s fastest public supercomputer, Frontier, has 37,000 AMD Instinct 250X GPUs.

How to break through traditional computer technology and go from serial to parallel processing: CPUs are fast, but they work by quickly executing a series of tasks, which requires a lot of interactivity. This is known as serial processing. GPU parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. As time goes by. Until the revolution in GPU processor technology and high-performance clusters. RedHat created a High Performance Cluster system configuration. The overall performance is close to that of a supercomputer processor using crossbar switches. But the bottleneck lies in how to transform traditional software applications from serial processing to parallel processing.

Reflection of reality in the technological world: A common consense that GPU processor manufacturer Nvidia had strong market share in the world. The Nvidia A100 processor delivers strong performance on intensive AI tasks and deep learning. A more budget-friendly option, the H100 can be preferred for graphics-intensive tasks. The H100’s optimizations, such as TensorRT-LLM and NVLink, show that it surpasses the A100, especially in the LLM area. Large Language Models (LLMs) have revolutionised the field of natural language processing. As these models grow in size and complexity, the computational demands for inference also increase significantly. To tackle this challenge, leveraging multiple GPUs becomes essential.

Supply constraints and product attribute design create headaches for web hosting providers: CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). But converting serial C code to data parallel code is a difficult problem. Because of this limitation. Nvidia develop NVIDIA CUDA Compiler (NVCC). This software is a proprietary compiler by Nvidia intended for use with CUDA.

Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python.

But you cannot use CUDA without a Nvidia Graphics Card. CUDA is a framework developed by Nvidia that allows people with a Nvidia Graphics Card to use GPU acceleration when it comes to deep learning, and not having a Nvidia graphics card defeats that purpose. (Refer to attached Diagram Part 1).

If web hosting service provider not use NVIDIA product, is it possible to use other brand GPU processor for AI machine learning? Yes, you can select OpenCilk.

OpenCilk (http://opencilk.org) is a new open-source platform to support task-parallel programming in C/C++. (Refer to attached Diagram Part 2)

Referring to the above details, the technological development atmosphere makes people foresee that two camps will operate in the future. This is the Nvidia camp and the non-Nvidia camp. This is why I have observed that web hosting service providers are giving themselves headaches in this new trend in technology gaming.

Artificial Intelligence technology development whether bring a battle for hegemony of compiler? (29th Dec 2023)

Preface: The competitors of LLVM such as GCC, Microsoft Visual C++, and Intel C++ Compiler. NVIDIA’s CUDA Compiler (NVCC) is based on the widely used LLVM open source compiler infrastructure. Furthermore, Tesla engineers wrote their own LLVM backed JIT neural compiler for Dojo.

Background: Instead of relying on computing power to function, GPUs rely on these numerous cores to pull data from memory, perform parallel calculations on it, and push the data back out for use. If you code something and compile it with a regular compiler, that’s not targeted for GPU execution, the code will always execute at the CPU. The GPU driver and compiler interact to ensure that the execution of the program on the GPU is correct operations. For example: You can compile CUDA codes for an architecture when your node hosts a GPU of different architecture.

A full build of LLVM and Clang will need around 15-20 GB of disk space. The exact space requirements will vary by system.

NVIDIA’s CUDA Compiler (NVCC) is based on the widely used LLVM open source compiler infrastructure. Developers can create or extend programming languages with support for GPU acceleration using the NVIDIA Compiler SDK.

Technical details: The LLVM is a low level register-based virtual machine. It is designed to abstract the underlying hardware and draw a clean line between a compiler back-end (machine code generation) and front-end (parsing, etc.). LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture.

Ref: LLVM Pass framework is an important component of LLVM infrastructure, and it performs code transformations and optimizations at LLVM IR level.

LLVM IR is the language used by the LLVM compiler for program analysis and transformation. It’s an intermediate step between the source code and machine code, serving as a kind of lingua franca that allows different languages to utilize the same optimization and code generation stages of the LLVM compiler.

Looking Ahead: But facing the prospect of cyber security, perhaps new compilers will join this battle in the future.

Processor technology perspective: Unified Memory with shared page tables (28th Dec 2023)

Preface: NVIDIA Ada Lovelace architecture GPUs are designed to deliver performance for professional graphics, video, AI and computing. The GPU is based on the Ada Lovelace architecture, which is different from the Hopper architecture used in the H100 GPU.

As of October 2022, NVLink is being phased out in NVIDIA’s new Ada Lovelace architecture. The GeForce RTX 4090 and the RTX 6000 Ada both do not support NVLink.

Background: The NVIDIA Grace Hopper Superchip pairs a power-efficient, high-bandwidth NVIDIA Grace CPU with a powerful NVIDIA H100 Hopper GPU using NVLink-C2C to maximize the capabilities for strong-scaling high-performance computing (HPC) and giant AI workloads.

NVLink-C2C is the enabler for Nvidia’s Grace-Hopper and Grace Superchip systems, with 900GB/s link between Grace and Hopper, or between two Grace chips.

Technical details: One of the major differences in many-core versus multicore architectures is the presence of two different memory spaces: a host space and a device space. In the case of NVIDIA GPUs, the device is supplied with data from the host via one of the multiple memory management API calls provided by the CUDA framework, such as CudaMallocManaged and CudaMemCpy. Modern systems, such as the Summit supercomputer, have the capability to avoid the use of CUDA calls for memory management and access the same data on GPU and CPU. This is done via the Address Translation Services (ATS) technology that gives a unified virtual address space for data allocated with malloc and new if there is an NVLink connection between the two memory spaces.

My comment: Since CUDA is proprietary parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). In normal circumstances, dynamic memory is allocated and released while the program is running, it may cause memory space fragmentation. Over time, this fragmentation can result in insufficient contiguous memory blocks for new allocations, resulting in memory allocation failures or unexpected behaviour. So, it’s hard to say that design limitations won’t arise in the future!

Reference: In CUDA, kernel code is written using the [code]global[/code] qualifier and is called from the host code to be executed on the GPU. In summary, [code]cudaMalloc[/code] is used in the host code to allocate memory on the GPU, while [code]malloc[/code] is used in the kernel code to allocate memory on the CPU.

CVE-2023-37188 Artificial Intelligence world versus tiny software components. Do not contempt a noncritical vulnerability! (27th December 2023)

Preface: Data science is an interdisciplinary field that combines statistical analysis, programming, and domain knowledge to extract valuable insights and make data-driven decisions.

Background: 2020 has been a year in which the Blosc program has received significant donations, totalling $55,000 to date. The most important tasks carried out between January 2020 and August 2020. Most of these tasks are related to the fastest projects under development: C-Blosc2 and Caterva (including its cat4py wrapper).

C-Blosc2 is the new major version of C-blosc, and it provides backward compatibility to both the C-Blosc1 API and its in-memory format.

C-Blosc2 adds new data containers, called superchunks, that are essentially a set of compressed chunks in memory that can be accessed randomly and enlarged during its lifetime.

Vulnerability details: C-blosc2 before 2.9.3 was discovered to contain a NULL pointer dereference via the function zfp_rate_decompress at zfp/blosc2-zfp[.]c.

My observation: On many platforms, dereferencing a null pointer results in abnormal program termination.

C-Blosc2 adds new data containers, called superchunks, that are essentially a set of compressed chunks in memory that can be accessed randomly and enlarged during its lifetime. The chunkdata pointer is later used as a destination argument in a call to memcpy(), resulting in user-defined data overwriting memory starting at address 0. It can be a potential risk example of a code execution exploit that resulted from a null pointer dereference.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-37188

Don’t underestimate the impact of today’s open-source software development! (18th Dec 2023)

Preface: In ten years ago, if you talk to people that your product software development use opensource products. Most likely cyber security expert will query your decision. But the trend of open-source software products usage seems change. The truth is a lot of open-source products alliances with enterprise computer vendor. So, the patch will deliver quickly when vulnerability found. As a matter of fact, in the world no software can avoid vulnerability occur. Furthermore, since open-source less portion bother by business decision. So it similar a technology booster driven the technology running more faster.

Background: In essence, a neural network accepts inputs , does some processing and produces outputs. This input-process-output mechanism is called neural network feed-forward. Understanding the feed-forward mechanism is required. To create a neural network that solves difficult practical problems such as facial recognition or voice identification.

PyTorch provides the elegantly designed modules and classes, including torch[.]nn, to help you create and train neural networks. An nn[.]Module contains layers, and a method forward(input) that returns the output.

Today’s market trends: According to news article published on Nov 2019. For autopilot, Tesla trains around 48 networks that do 1,000 different predictions and it takes 70,000 GPU hours. Moreover, this training is not a one-time affair but an iterative one and all these workflows should be automated while making sure that these 1,000 different predictions don’t regress over time.

PyTorch, especially has become the go-to framework for machine learning researchers. It is fast and efficient, allowing users to quickly iterate on experiments and build models. PyTorch supports both CUDA and OpenCL, making it easy to take advantage of powerful GPUs for faster training.

There is no doubt about the future development of artificial intelligence, so the demand for GPUs goes hand in hand with autonomous driving.

For AI world in future, NVIDIA has developed a Secure Deployment Considerations Guide address to Triton Inference Server (6th Dec 2023)

Preface: Artificial intelligence (AI) is growing like lightning. As a I.T computer user. Maybe we enjoy the benefits of smartphone apps features empowered by AI. As a matter of fact, we do no care or without knowledge what is AI back-end operations and architecture. For example, when you buy a steamed bun at the store, you certainly don’t worry about whether there are cockroaches in the kitchen. Because you know there are public health regulations in place to prevent that. This concept also applied to AI world. So, NVIDIAs has developed a Secure Deployment Considerations Guide address to Triton Inference Server. I hope this short article has piqued your interest.

Background: AI Inference is achieved through an “inference engine” that applies logical rules to the knowledge base to evaluate and analyze new information. In the process of machine learning, there are two phases. First, is the training phase where intelligence is developed by recording, storing, and labeling information. Second, is the inference phase where the machine uses the intelligence gathered and stored in phase one to understand new data.

General-purpose web servers lack support for AI inference features.

*There is no out-of-box support to take advantage of accelerators like GPUs, or to turn on dynamic batching or multi-node inference.

*Users need to build logic to meet the demands of specific use cases, like audio/video streaming input, stateful processing, or preprocessing the input data to fit the model.

*Metrics on compute and memory utilization or inference latency are not easily accessible to monitor application performance and scale.

Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model being managed by the server.

Secure Deployment Considerations: Artificial Intelligence (AI) and Machine Learning (ML) cannot keep to yourself without the support of programming languages. Developers can deploy Triton as an http server, a grpc server, a server supporting both, or embed a Triton server into their own application. Python is one of the major code languages for AI and ML. PyTriton is a simple interface that enables Python developers to use Triton Inference Server to serve AI models, simple processing functions, or entire inference pipelines within Python code.

For Secure Deployment Considerations – Please refer to the link for details – https://github.com/triton-inference-server/pytriton

Understanding machine learning (activation functions) in a casual way. (30th Nov 2023)

Preface: Maybe it’s a long story, but in a nutshell, this is page one. In fact, when you start studying on your first day. No matter it is an overview of AI technology. The information covers advanced mathematics, graphics and technical terminology. It will reduce your interest. In fact, the world of mathematics is complex. If a child is naturally insensitive to mathematical calculations. Could it be said that he is not suitable for working in artificial intelligence technology? The answer is not absolute. For example: Computer assembly language is difficult and complex to remember. Therefore, the solution is to develop other programming languages ​​and then convert (compile) them into machine language. This is a successful outcome in today’s technological world. Therefore, many people believe that artificial intelligence technology should help humans in other ways rather than replace human work.

Background: The machine learning process requires CPUs and GPUs. GPUs are used to train large deep learning models, while CPUs are good for data preparation, feature extraction, and small-scale models. For inference and hyperparameter tweaking, CPUs and GPUs may both be utilized.

CPU and GPU memory coherence requires data transfer, and requires defining what areas of memory are shared and with which GPUs.

Long story short: Cognition refers to the process of acquiring knowledge and understanding through thinking, experience and senses. In machine learning some neural networks will use custom non-linear activation functions or a non-standard image filter.

The technology behind facial recognition is based on deep learning, a subset of machine learning that involves training artificial neural networks to recognize patterns in data.

Ref: Non-Linear Activation Functions. The non-linear functions are known to be the most used activation functions. It makes it easy for a neural network model to adapt with a variety of data. Adaptive neural networks have the ability to overcome some significant challenges faced by artificial neural networks. The adaptability reduces the time required to train neural networks and also makes a neural model scalable as they can adapt to structure and input data at any point in time while training.