Category Archives: AI and ML

Artificial Intelligence technology development whether bring a battle for hegemony of compiler? (29th Dec 2023)

Preface: The competitors of LLVM such as GCC, Microsoft Visual C++, and Intel C++ Compiler. NVIDIA’s CUDA Compiler (NVCC) is based on the widely used LLVM open source compiler infrastructure. Furthermore, Tesla engineers wrote their own LLVM backed JIT neural compiler for Dojo.

Background: Instead of relying on computing power to function, GPUs rely on these numerous cores to pull data from memory, perform parallel calculations on it, and push the data back out for use. If you code something and compile it with a regular compiler, that’s not targeted for GPU execution, the code will always execute at the CPU. The GPU driver and compiler interact to ensure that the execution of the program on the GPU is correct operations. For example: You can compile CUDA codes for an architecture when your node hosts a GPU of different architecture.

A full build of LLVM and Clang will need around 15-20 GB of disk space. The exact space requirements will vary by system.

NVIDIA’s CUDA Compiler (NVCC) is based on the widely used LLVM open source compiler infrastructure. Developers can create or extend programming languages with support for GPU acceleration using the NVIDIA Compiler SDK.

Technical details: The LLVM is a low level register-based virtual machine. It is designed to abstract the underlying hardware and draw a clean line between a compiler back-end (machine code generation) and front-end (parsing, etc.). LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture.

Ref: LLVM Pass framework is an important component of LLVM infrastructure, and it performs code transformations and optimizations at LLVM IR level.

LLVM IR is the language used by the LLVM compiler for program analysis and transformation. It’s an intermediate step between the source code and machine code, serving as a kind of lingua franca that allows different languages to utilize the same optimization and code generation stages of the LLVM compiler.

Looking Ahead: But facing the prospect of cyber security, perhaps new compilers will join this battle in the future.

Processor technology perspective: Unified Memory with shared page tables (28th Dec 2023)

Preface: NVIDIA Ada Lovelace architecture GPUs are designed to deliver performance for professional graphics, video, AI and computing. The GPU is based on the Ada Lovelace architecture, which is different from the Hopper architecture used in the H100 GPU.

As of October 2022, NVLink is being phased out in NVIDIA’s new Ada Lovelace architecture. The GeForce RTX 4090 and the RTX 6000 Ada both do not support NVLink.

Background: The NVIDIA Grace Hopper Superchip pairs a power-efficient, high-bandwidth NVIDIA Grace CPU with a powerful NVIDIA H100 Hopper GPU using NVLink-C2C to maximize the capabilities for strong-scaling high-performance computing (HPC) and giant AI workloads.

NVLink-C2C is the enabler for Nvidia’s Grace-Hopper and Grace Superchip systems, with 900GB/s link between Grace and Hopper, or between two Grace chips.

Technical details: One of the major differences in many-core versus multicore architectures is the presence of two different memory spaces: a host space and a device space. In the case of NVIDIA GPUs, the device is supplied with data from the host via one of the multiple memory management API calls provided by the CUDA framework, such as CudaMallocManaged and CudaMemCpy. Modern systems, such as the Summit supercomputer, have the capability to avoid the use of CUDA calls for memory management and access the same data on GPU and CPU. This is done via the Address Translation Services (ATS) technology that gives a unified virtual address space for data allocated with malloc and new if there is an NVLink connection between the two memory spaces.

My comment: Since CUDA is proprietary parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). In normal circumstances, dynamic memory is allocated and released while the program is running, it may cause memory space fragmentation. Over time, this fragmentation can result in insufficient contiguous memory blocks for new allocations, resulting in memory allocation failures or unexpected behaviour. So, it’s hard to say that design limitations won’t arise in the future!

Reference: In CUDA, kernel code is written using the [code]global[/code] qualifier and is called from the host code to be executed on the GPU. In summary, [code]cudaMalloc[/code] is used in the host code to allocate memory on the GPU, while [code]malloc[/code] is used in the kernel code to allocate memory on the CPU.

CVE-2023-37188 Artificial Intelligence world versus tiny software components. Do not contempt a noncritical vulnerability! (27th December 2023)

Preface: Data science is an interdisciplinary field that combines statistical analysis, programming, and domain knowledge to extract valuable insights and make data-driven decisions.

Background: 2020 has been a year in which the Blosc program has received significant donations, totalling $55,000 to date. The most important tasks carried out between January 2020 and August 2020. Most of these tasks are related to the fastest projects under development: C-Blosc2 and Caterva (including its cat4py wrapper).

C-Blosc2 is the new major version of C-blosc, and it provides backward compatibility to both the C-Blosc1 API and its in-memory format.

C-Blosc2 adds new data containers, called superchunks, that are essentially a set of compressed chunks in memory that can be accessed randomly and enlarged during its lifetime.

Vulnerability details: C-blosc2 before 2.9.3 was discovered to contain a NULL pointer dereference via the function zfp_rate_decompress at zfp/blosc2-zfp[.]c.

My observation: On many platforms, dereferencing a null pointer results in abnormal program termination.

C-Blosc2 adds new data containers, called superchunks, that are essentially a set of compressed chunks in memory that can be accessed randomly and enlarged during its lifetime. The chunkdata pointer is later used as a destination argument in a call to memcpy(), resulting in user-defined data overwriting memory starting at address 0. It can be a potential risk example of a code execution exploit that resulted from a null pointer dereference.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-37188

Don’t underestimate the impact of today’s open-source software development! (18th Dec 2023)

Preface: In ten years ago, if you talk to people that your product software development use opensource products. Most likely cyber security expert will query your decision. But the trend of open-source software products usage seems change. The truth is a lot of open-source products alliances with enterprise computer vendor. So, the patch will deliver quickly when vulnerability found. As a matter of fact, in the world no software can avoid vulnerability occur. Furthermore, since open-source less portion bother by business decision. So it similar a technology booster driven the technology running more faster.

Background: In essence, a neural network accepts inputs , does some processing and produces outputs. This input-process-output mechanism is called neural network feed-forward. Understanding the feed-forward mechanism is required. To create a neural network that solves difficult practical problems such as facial recognition or voice identification.

PyTorch provides the elegantly designed modules and classes, including torch[.]nn, to help you create and train neural networks. An nn[.]Module contains layers, and a method forward(input) that returns the output.

Today’s market trends: According to news article published on Nov 2019. For autopilot, Tesla trains around 48 networks that do 1,000 different predictions and it takes 70,000 GPU hours. Moreover, this training is not a one-time affair but an iterative one and all these workflows should be automated while making sure that these 1,000 different predictions don’t regress over time.

PyTorch, especially has become the go-to framework for machine learning researchers. It is fast and efficient, allowing users to quickly iterate on experiments and build models. PyTorch supports both CUDA and OpenCL, making it easy to take advantage of powerful GPUs for faster training.

There is no doubt about the future development of artificial intelligence, so the demand for GPUs goes hand in hand with autonomous driving.

For AI world in future, NVIDIA has developed a Secure Deployment Considerations Guide address to Triton Inference Server (6th Dec 2023)

Preface: Artificial intelligence (AI) is growing like lightning. As a I.T computer user. Maybe we enjoy the benefits of smartphone apps features empowered by AI. As a matter of fact, we do no care or without knowledge what is AI back-end operations and architecture. For example, when you buy a steamed bun at the store, you certainly don’t worry about whether there are cockroaches in the kitchen. Because you know there are public health regulations in place to prevent that. This concept also applied to AI world. So, NVIDIAs has developed a Secure Deployment Considerations Guide address to Triton Inference Server. I hope this short article has piqued your interest.

Background: AI Inference is achieved through an “inference engine” that applies logical rules to the knowledge base to evaluate and analyze new information. In the process of machine learning, there are two phases. First, is the training phase where intelligence is developed by recording, storing, and labeling information. Second, is the inference phase where the machine uses the intelligence gathered and stored in phase one to understand new data.

General-purpose web servers lack support for AI inference features.

*There is no out-of-box support to take advantage of accelerators like GPUs, or to turn on dynamic batching or multi-node inference.

*Users need to build logic to meet the demands of specific use cases, like audio/video streaming input, stateful processing, or preprocessing the input data to fit the model.

*Metrics on compute and memory utilization or inference latency are not easily accessible to monitor application performance and scale.

Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model being managed by the server.

Secure Deployment Considerations: Artificial Intelligence (AI) and Machine Learning (ML) cannot keep to yourself without the support of programming languages. Developers can deploy Triton as an http server, a grpc server, a server supporting both, or embed a Triton server into their own application. Python is one of the major code languages for AI and ML. PyTriton is a simple interface that enables Python developers to use Triton Inference Server to serve AI models, simple processing functions, or entire inference pipelines within Python code.

For Secure Deployment Considerations – Please refer to the link for details – https://github.com/triton-inference-server/pytriton

Understanding machine learning (activation functions) in a casual way. (30th Nov 2023)

Preface: Maybe it’s a long story, but in a nutshell, this is page one. In fact, when you start studying on your first day. No matter it is an overview of AI technology. The information covers advanced mathematics, graphics and technical terminology. It will reduce your interest. In fact, the world of mathematics is complex. If a child is naturally insensitive to mathematical calculations. Could it be said that he is not suitable for working in artificial intelligence technology? The answer is not absolute. For example: Computer assembly language is difficult and complex to remember. Therefore, the solution is to develop other programming languages ​​and then convert (compile) them into machine language. This is a successful outcome in today’s technological world. Therefore, many people believe that artificial intelligence technology should help humans in other ways rather than replace human work.

Background: The machine learning process requires CPUs and GPUs. GPUs are used to train large deep learning models, while CPUs are good for data preparation, feature extraction, and small-scale models. For inference and hyperparameter tweaking, CPUs and GPUs may both be utilized.

CPU and GPU memory coherence requires data transfer, and requires defining what areas of memory are shared and with which GPUs.

Long story short: Cognition refers to the process of acquiring knowledge and understanding through thinking, experience and senses. In machine learning some neural networks will use custom non-linear activation functions or a non-standard image filter.

The technology behind facial recognition is based on deep learning, a subset of machine learning that involves training artificial neural networks to recognize patterns in data.

Ref: Non-Linear Activation Functions. The non-linear functions are known to be the most used activation functions. It makes it easy for a neural network model to adapt with a variety of data. Adaptive neural networks have the ability to overcome some significant challenges faced by artificial neural networks. The adaptability reduces the time required to train neural networks and also makes a neural model scalable as they can adapt to structure and input data at any point in time while training.

CVE-2023-48105: Weakness in buffer boundary checks in wasm loader (23rd Nov 2023)

Preface: Decentralized AI is an approach to AI where the data and models are distributed across multiple devices, rather than being centralized in a single location. Such design benefits to AI infrastructure avoiding denial of service attack and let unknown technical matter occurs during this period.

Background: Internet Computing aims to extend the capabilities of the public Internet through a serverless cloud model. Serverless is a cloud computing application development and execution model that enables developers to build and run application code without provisioning or managing servers or backend infrastructure.

WebAssembly (wasm), is a virtual machine for executing general purpose code. When designing the architecture of the Internet Computer, the DFINITY Foundation recognized the potential of WebAssembly as a virtual machine for blockchain. Apart from Blockchain, Dfinity Foundation and Singularitynet Partner to Transform Decentralized AI with Blockchain Integration.

A canister is a WebAssembly (wasm) module that can run on the Internet Computer. Only four programming language currently have Canister Development Kits (CDK) — a suite of libraries and scripts for building WebAssembly binaries that are compatible with the Internet Computer. They are Motoko, Python, TypeScript, and Rust.

Note: As shown above, it shows the future sustainability of Python. There is no doubt that Python can be expanded into the world of artificial intelligence.

Vulnerability details: An heap overflow vulnerability was discovered in Bytecode alliance wasm-micro-runtime v.1.2.3 allows a remote attacker to cause a denial of service via the wasm_loader_prepare_bytecode function in core/iwasm/interpreter/wasm_loader[.]c.

Additional: Internet Computing aims to extend the capabilities of the public Internet through a serverless cloud model. While the snapshot and rewinding technique with nested attestation can enable a fast and verifiable reset of an enclave, ensuring the security of such techniques is not trivial, particularly in a serverless environment where an adversary may try to breach the security by executing a malicious workload. To address this issue, it is proposed multi-layer intra-enclave compartmentalisation (MLIEC) using compiler techniques. With MLIEC, we can protect the snapshot and rewinding technique in a higher security layer than the regular enclave code (e.g., the Wasm runtime), ensuring that even if the regular enclave environment is compromised, the enclave reset can still be carried out correctly and restore the environment. However the design weakness occurs on buffer boundary checks in wasm loader. So, the remedy is adding more buffer boundary checks in wasm loader. Example: CHECK_BUF(p, p_end, 1);

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-48105

Do not contempt CVE-2023-6238: kernel: nvme: memory corruption via unprivileged user passthrough  (22nd Nov 2023)

Preface: High-performance computing is a method of processing large amounts of data and performing complex calculations at high speed. HPC is well suited for AI, which uses large data sets and complex models. HPC and AI combined have use cases in the following areas: Predictive Analytics. Physics and Modeling.

IO-Heavy HPC Computing: Requires systems that can read/write and store large amounts of data on disks. This type of computing includes systems that provide fast NVMe implementations for local IO or as part of a parallel file system.

Background: What is metadata for NVMe? Similar to SCSI / SAS devices, the NVMe standard supports the addition of 8 bytes (called metadata or protection information (PI)) to each data sector to ensure data integrity during data transfer.

NVMe protocol defines commands that utilize Physical Region Pages (PRP)/Scatter Gather Lists (SGL) to denote a data buffer location in host memory. The data buffer may be represented using single or multiple PRP/SGL entries similar to a linked list. Associated information for a command including PRP/SGL may be formed before the command is issued to the SSD for execution. The SSD, while executing the command, may fetch the associated PRP/SGL and perform data movement related to the command.

However, NVMe has no separate field to encode the metadata length expected (except when using SGLs). Because of that we can’t allow to transfer arbitrary metadata, as a metadata buffer that is shorted than what the device expects for the command will lead to arbitrary kernel (if bounce buffering) or userspace (if not) memory corruption.

Vulnerability details: A buffer overflow vulnerability was found in the NVM Express (NVMe) driver in the Linux kernel. An unprivileged user could specify a small meta buffer and let the device perform larger Direct Memory Access (DMA) into the same buffer, overwriting unrelated kernel memory, causing random kernel crashes and memory corruption.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-6238

One of the milestones in the digital world, especially artificial intelligence technology (9th Nov 2023)

Preface: The Matrix is ​​a 1999 science fiction action film. At that time, virtual machines technology were not yet in a mature stage. IBM mainframe LPAR (Logical partitions (LPARs)) is the only implement in market successful.  Even Docker technology hasn’t even been born yet! But the film’s screenwriter seemed to predict the truth.

What is the simple definition of a matrix? The matrices are a two-dimensional set of numbers or symbols distributed in a rectangular shape in vertical and horizontal lines so that their elements are arranged in rows and columns.

Background: About five years ago, it was known that parallel computation could unlock the performance (processing speed) of supercomputers. However, programs written in traditional C language still have issue operation in this platform because C program instruction executes sequentially and do not support data parallel computation, it increases the time complexity of a program. Until Docker CUDA was born. This bottleneck appears to have been resolved. This is one of the milestones in the digital world, especially artificial intelligence technology.

Technical details: It is hard to write program in CUDA for average programmer. CUDA puts load on the programmer.

-To package GPU code in separate functions called kernel.

-Need to explicitly manage data transfer between host memory and GPU memory.

-Manual optimization of GPU memory is required.

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs.

How does CUDA help in AI?

In addition to its components for deep learning, the CUDA Toolkit includes various libraries and components. These provide support for debugging and optimization, compiling, documentation, runtimes, signal processing, and parallel algorithms.

Official document reference: For details please refer to the link –  https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda

The big data driven AI robots development. This is not a dream. (6th Nov 2023)

Preface: As of today AI tools has ChatGPT, BERT, LaMDA, GPT-3, DALL-E-2, MidJourney, and Stable Diffusion. ChatGPT was released as a freely available research preview, but due to its popularity, OpenAI now operates the service on a freemium model. It allows users on its free tier to access the GPT-3.5-based version.

Background: Legged robots, or walking machines, are designed for locomotion on rough terrain and require control of leg actuators to maintain balance, sensors to determine foot Starting from the 5G communication technology era, 5G aims to support a 100-fold increase in traffic capacity and network efficiency. So advance AI robot will rely on fast and wide coverage of Radio communication network.  Meanwhile, advanced artificial intelligence robots with decision-making and thinking mechanisms will rely on remote location big data infrastructure. So, do you think this can provide space for how humans govern this AI technology.placement and planning algorithms to determine the direction and speed of movement. Since legged robots, or walking machines installation space is limit. So this type of design  give people no so intelligence. By offloading complex computations to the cloud, robots can process vast amounts of data quickly and perform tasks that require extensive processing resources, far exceeding the capabilities of their onboard hardware.

Without 5G, there would be no real AI robots:

Coincidences are rare in science. But when we look back at the development history of 5G, we will find that this road is not smooth. On the other hand, if there is no 5G arrive in time, I believe so call artificial intelligence legged robot not easy to born. If robot cannot similar like human free to walk and without area of limitation. We cannot say our technologies is migrate to advanced digital world.

About Artificial Intelligence Endangering Human existence Value: About three years ago, when you attend seminar , the speaker will laugh when he heard AI endanger human existence value. Their comments at that time was don’t be worries too much. They are not as clever as human. The AI technology only replace the low level work job. As times goes by, the transformation of industrial process  can tell. AI technology come to our age within short period of time. On first week of Nov, 2023. The CEO of Telsa,  Elon Musk predicted that human work will become obsolete as artificial intelligence progresses, calling it “the most disruptive force in history.”

Seems we do not have choice this trend. But what we can do?

Headline new: https://www.dailymail.co.uk/sciencetech/article-12706621/When-job-taken-robot-Elon-Musk-insists-AI-mean-no-one-work-experts-reveal-careers-replaced-IMMEDIATELY-face-chop-future.html