Category Archives: AI and ML

For AI world in future, NVIDIA has developed a Secure Deployment Considerations Guide address to Triton Inference Server (6th Dec 2023)

Preface: Artificial intelligence (AI) is growing like lightning. As a I.T computer user. Maybe we enjoy the benefits of smartphone apps features empowered by AI. As a matter of fact, we do no care or without knowledge what is AI back-end operations and architecture. For example, when you buy a steamed bun at the store, you certainly don’t worry about whether there are cockroaches in the kitchen. Because you know there are public health regulations in place to prevent that. This concept also applied to AI world. So, NVIDIAs has developed a Secure Deployment Considerations Guide address to Triton Inference Server. I hope this short article has piqued your interest.

Background: AI Inference is achieved through an “inference engine” that applies logical rules to the knowledge base to evaluate and analyze new information. In the process of machine learning, there are two phases. First, is the training phase where intelligence is developed by recording, storing, and labeling information. Second, is the inference phase where the machine uses the intelligence gathered and stored in phase one to understand new data.

General-purpose web servers lack support for AI inference features.

*There is no out-of-box support to take advantage of accelerators like GPUs, or to turn on dynamic batching or multi-node inference.

*Users need to build logic to meet the demands of specific use cases, like audio/video streaming input, stateful processing, or preprocessing the input data to fit the model.

*Metrics on compute and memory utilization or inference latency are not easily accessible to monitor application performance and scale.

Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model being managed by the server.

Secure Deployment Considerations: Artificial Intelligence (AI) and Machine Learning (ML) cannot keep to yourself without the support of programming languages. Developers can deploy Triton as an http server, a grpc server, a server supporting both, or embed a Triton server into their own application. Python is one of the major code languages for AI and ML. PyTriton is a simple interface that enables Python developers to use Triton Inference Server to serve AI models, simple processing functions, or entire inference pipelines within Python code.

For Secure Deployment Considerations – Please refer to the link for details – https://github.com/triton-inference-server/pytriton

Understanding machine learning (activation functions) in a casual way. (30th Nov 2023)

Preface: Maybe it’s a long story, but in a nutshell, this is page one. In fact, when you start studying on your first day. No matter it is an overview of AI technology. The information covers advanced mathematics, graphics and technical terminology. It will reduce your interest. In fact, the world of mathematics is complex. If a child is naturally insensitive to mathematical calculations. Could it be said that he is not suitable for working in artificial intelligence technology? The answer is not absolute. For example: Computer assembly language is difficult and complex to remember. Therefore, the solution is to develop other programming languages ​​and then convert (compile) them into machine language. This is a successful outcome in today’s technological world. Therefore, many people believe that artificial intelligence technology should help humans in other ways rather than replace human work.

Background: The machine learning process requires CPUs and GPUs. GPUs are used to train large deep learning models, while CPUs are good for data preparation, feature extraction, and small-scale models. For inference and hyperparameter tweaking, CPUs and GPUs may both be utilized.

CPU and GPU memory coherence requires data transfer, and requires defining what areas of memory are shared and with which GPUs.

Long story short: Cognition refers to the process of acquiring knowledge and understanding through thinking, experience and senses. In machine learning some neural networks will use custom non-linear activation functions or a non-standard image filter.

The technology behind facial recognition is based on deep learning, a subset of machine learning that involves training artificial neural networks to recognize patterns in data.

Ref: Non-Linear Activation Functions. The non-linear functions are known to be the most used activation functions. It makes it easy for a neural network model to adapt with a variety of data. Adaptive neural networks have the ability to overcome some significant challenges faced by artificial neural networks. The adaptability reduces the time required to train neural networks and also makes a neural model scalable as they can adapt to structure and input data at any point in time while training.

CVE-2023-48105: Weakness in buffer boundary checks in wasm loader (23rd Nov 2023)

Preface: Decentralized AI is an approach to AI where the data and models are distributed across multiple devices, rather than being centralized in a single location. Such design benefits to AI infrastructure avoiding denial of service attack and let unknown technical matter occurs during this period.

Background: Internet Computing aims to extend the capabilities of the public Internet through a serverless cloud model. Serverless is a cloud computing application development and execution model that enables developers to build and run application code without provisioning or managing servers or backend infrastructure.

WebAssembly (wasm), is a virtual machine for executing general purpose code. When designing the architecture of the Internet Computer, the DFINITY Foundation recognized the potential of WebAssembly as a virtual machine for blockchain. Apart from Blockchain, Dfinity Foundation and Singularitynet Partner to Transform Decentralized AI with Blockchain Integration.

A canister is a WebAssembly (wasm) module that can run on the Internet Computer. Only four programming language currently have Canister Development Kits (CDK) — a suite of libraries and scripts for building WebAssembly binaries that are compatible with the Internet Computer. They are Motoko, Python, TypeScript, and Rust.

Note: As shown above, it shows the future sustainability of Python. There is no doubt that Python can be expanded into the world of artificial intelligence.

Vulnerability details: An heap overflow vulnerability was discovered in Bytecode alliance wasm-micro-runtime v.1.2.3 allows a remote attacker to cause a denial of service via the wasm_loader_prepare_bytecode function in core/iwasm/interpreter/wasm_loader[.]c.

Additional: Internet Computing aims to extend the capabilities of the public Internet through a serverless cloud model. While the snapshot and rewinding technique with nested attestation can enable a fast and verifiable reset of an enclave, ensuring the security of such techniques is not trivial, particularly in a serverless environment where an adversary may try to breach the security by executing a malicious workload. To address this issue, it is proposed multi-layer intra-enclave compartmentalisation (MLIEC) using compiler techniques. With MLIEC, we can protect the snapshot and rewinding technique in a higher security layer than the regular enclave code (e.g., the Wasm runtime), ensuring that even if the regular enclave environment is compromised, the enclave reset can still be carried out correctly and restore the environment. However the design weakness occurs on buffer boundary checks in wasm loader. So, the remedy is adding more buffer boundary checks in wasm loader. Example: CHECK_BUF(p, p_end, 1);

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-48105

Do not contempt CVE-2023-6238: kernel: nvme: memory corruption via unprivileged user passthrough  (22nd Nov 2023)

Preface: High-performance computing is a method of processing large amounts of data and performing complex calculations at high speed. HPC is well suited for AI, which uses large data sets and complex models. HPC and AI combined have use cases in the following areas: Predictive Analytics. Physics and Modeling.

IO-Heavy HPC Computing: Requires systems that can read/write and store large amounts of data on disks. This type of computing includes systems that provide fast NVMe implementations for local IO or as part of a parallel file system.

Background: What is metadata for NVMe? Similar to SCSI / SAS devices, the NVMe standard supports the addition of 8 bytes (called metadata or protection information (PI)) to each data sector to ensure data integrity during data transfer.

NVMe protocol defines commands that utilize Physical Region Pages (PRP)/Scatter Gather Lists (SGL) to denote a data buffer location in host memory. The data buffer may be represented using single or multiple PRP/SGL entries similar to a linked list. Associated information for a command including PRP/SGL may be formed before the command is issued to the SSD for execution. The SSD, while executing the command, may fetch the associated PRP/SGL and perform data movement related to the command.

However, NVMe has no separate field to encode the metadata length expected (except when using SGLs). Because of that we can’t allow to transfer arbitrary metadata, as a metadata buffer that is shorted than what the device expects for the command will lead to arbitrary kernel (if bounce buffering) or userspace (if not) memory corruption.

Vulnerability details: A buffer overflow vulnerability was found in the NVM Express (NVMe) driver in the Linux kernel. An unprivileged user could specify a small meta buffer and let the device perform larger Direct Memory Access (DMA) into the same buffer, overwriting unrelated kernel memory, causing random kernel crashes and memory corruption.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-6238

One of the milestones in the digital world, especially artificial intelligence technology (9th Nov 2023)

Preface: The Matrix is ​​a 1999 science fiction action film. At that time, virtual machines technology were not yet in a mature stage. IBM mainframe LPAR (Logical partitions (LPARs)) is the only implement in market successful.  Even Docker technology hasn’t even been born yet! But the film’s screenwriter seemed to predict the truth.

What is the simple definition of a matrix? The matrices are a two-dimensional set of numbers or symbols distributed in a rectangular shape in vertical and horizontal lines so that their elements are arranged in rows and columns.

Background: About five years ago, it was known that parallel computation could unlock the performance (processing speed) of supercomputers. However, programs written in traditional C language still have issue operation in this platform because C program instruction executes sequentially and do not support data parallel computation, it increases the time complexity of a program. Until Docker CUDA was born. This bottleneck appears to have been resolved. This is one of the milestones in the digital world, especially artificial intelligence technology.

Technical details: It is hard to write program in CUDA for average programmer. CUDA puts load on the programmer.

-To package GPU code in separate functions called kernel.

-Need to explicitly manage data transfer between host memory and GPU memory.

-Manual optimization of GPU memory is required.

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs.

How does CUDA help in AI?

In addition to its components for deep learning, the CUDA Toolkit includes various libraries and components. These provide support for debugging and optimization, compiling, documentation, runtimes, signal processing, and parallel algorithms.

Official document reference: For details please refer to the link –  https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda

The big data driven AI robots development. This is not a dream. (6th Nov 2023)

Preface: As of today AI tools has ChatGPT, BERT, LaMDA, GPT-3, DALL-E-2, MidJourney, and Stable Diffusion. ChatGPT was released as a freely available research preview, but due to its popularity, OpenAI now operates the service on a freemium model. It allows users on its free tier to access the GPT-3.5-based version.

Background: Legged robots, or walking machines, are designed for locomotion on rough terrain and require control of leg actuators to maintain balance, sensors to determine foot Starting from the 5G communication technology era, 5G aims to support a 100-fold increase in traffic capacity and network efficiency. So advance AI robot will rely on fast and wide coverage of Radio communication network.  Meanwhile, advanced artificial intelligence robots with decision-making and thinking mechanisms will rely on remote location big data infrastructure. So, do you think this can provide space for how humans govern this AI technology.placement and planning algorithms to determine the direction and speed of movement. Since legged robots, or walking machines installation space is limit. So this type of design  give people no so intelligence. By offloading complex computations to the cloud, robots can process vast amounts of data quickly and perform tasks that require extensive processing resources, far exceeding the capabilities of their onboard hardware.

Without 5G, there would be no real AI robots:

Coincidences are rare in science. But when we look back at the development history of 5G, we will find that this road is not smooth. On the other hand, if there is no 5G arrive in time, I believe so call artificial intelligence legged robot not easy to born. If robot cannot similar like human free to walk and without area of limitation. We cannot say our technologies is migrate to advanced digital world.

About Artificial Intelligence Endangering Human existence Value: About three years ago, when you attend seminar , the speaker will laugh when he heard AI endanger human existence value. Their comments at that time was don’t be worries too much. They are not as clever as human. The AI technology only replace the low level work job. As times goes by, the transformation of industrial process  can tell. AI technology come to our age within short period of time. On first week of Nov, 2023. The CEO of Telsa,  Elon Musk predicted that human work will become obsolete as artificial intelligence progresses, calling it “the most disruptive force in history.”

Seems we do not have choice this trend. But what we can do?

Headline new: https://www.dailymail.co.uk/sciencetech/article-12706621/When-job-taken-robot-Elon-Musk-insists-AI-mean-no-one-work-experts-reveal-careers-replaced-IMMEDIATELY-face-chop-future.html

Drones are similar to radio-controlled aircraft, but GPS and (5G or 4G) empowerment are leading the way. (29th Jul 2023)

Preface: Do you think Chinese meaning of intelligence and clever have similar meanings? if you ask me. Perhaps I would say that clever refers to the sensitivity of the ears and eyes. Intelligence is about your brain. Maybe you have other explanations. When you watch a sci-fi movie, the AI core sends out drones to detect and arrest people. Maybe the plot is not described in detail. Comprehensive artificial intelligence machine learning should include vision and hearing. The obvious thing about drones is the vision capabilities of AI.

Background: It seems indisputable that 5G networks can enhance the performance capabilities of drones. Perhaps, it cannot be compared with the RF capabilities of the military type. If you’re interested in the subject, you’ll notice that many manufacturers are capable of designing drones and producing them. The reason is that chip manufacturers can provide flight control chips. Many open source firmwares can be easily downloaded on Github. Throttle actuators and flight control surface actuators are also available on the electronics market. The most commonly used languages in avionics software engineering are C, C++, Ada, and Python. In the main definition of network security, the TCP/IP protocol stack is one of the important trigger factors for network attacks. 5G or RF communication is exchanged in the communication gateway. Therefore, the IP protocol will be implemented after the 5G or RF signal exchange in the gateway. So cybersecurity will never leave this technology.

Ref: STM32 F4, G4, F7 and H7 are the popular microcontroller for drone. The flight controller unit (FCU) major component is the microcontroller.

The FCU consists of a processor and an Inertial Measurement Unit with a high precision accelerometer and gyroscope, necessary for stable flight.

Is the development of science and technology in opposition to the natural environment?

Humans can send probes to Mars and create and form artificial intelligence on Earth. Drone can do the real time monitroing even in extreme dangerous zone. Today’s medical technology enhances our healthcare. But when we look at the climate today, the Antarctic icebergs are melting, causing disaster and rising sea levels. So far, the situation has not improved!

See if artificial intelligence can provide solutions for humans soon?

Understanding the AI strucrure will help you defend your facilities (10th July 2023)

Preface: Python has 100s of pre-built libraries to implement various Machine Learning and Deep Learning algorithms.

Background: When the word “AI” not found, Python alive everywhere in IT world. Scientist use Python programming language assists their complicated mathematics calculations. Data scientist to analyze geospatial vector data to identify a specific location. Python language is flexi and powerful. But the fundamental design weakness is well known. As a result, you should protect your AI system structure attack by threat actors. For example:
– Some Python security vulnerabilities come from Python’s open-source libraries.
– Injection: Dependency injection is a common design pattern used in most modern programming language. As a matter of fact, such weakness remains an underutilized functionality in Python.
– Command injection which can be expose through misconfig software application. As a result, the possible attack will execute arbitrary command injection, XML external entity injection and server side template injection.


About best practices for Python security

  • Conducting a vulnerability scan weekly. This is the way let you identify and fix Python security vulnerability
  • You should sanitize external data before use.
  • Follow your Python version SDLC, DevSEC should aware of this matter.
    Additional: NVIDIA empower Artificial Intelligence competence. Staying alert with GPU vendor vulnerability management announcement.

NVIDIA empower Artificial Intelligence competence. At the same time, vendor urge staying alert for product vulnerability (2nd Jul 2023)

Preface: The A800 has a data transfer rate of 400GB/s and the A100 is 600GB/s, and as such complies with the 600GB/s or less.

Background: What is SMM? It turned out to be SM in the Fermi era and SMX in the Kepler era. If you enlarge the SMX core of Kepler, you will see more LD/ST access units than Fermi, which also means that
the number of execution threads processed by Kepler in a single cycle is higher than that of Fermi.
Streaming Multiprocessor composed of CUDA Core, PolyMorph Engine and other units.
Simply put, it is to fine-tune the number of CUDA Cores built in the SMM unit from 192 to 128. The SMM is divided into 4 small blocks,
and each block has an independent control logic (Control Logic). In the past, these control logics needed to be responsible for a large number of CUDA Cores. Through small blocks.

Vulnerability details:
CVE‑2023‑25521: The NVIDIA DGX A100 and A800 systems contain a vulnerability in SBIOS, where improper validation of an input parameter
may lead to code execution, escalation of privileges, denial of service, information disclosure, and data tampering.
CVE-2023-25522: The NVIDIA DGX A100 and A800 systems contain a vulnerability in SBIOS, where information that is provided
in an unexpected format may cause improper validation of an input parameter, which may lead to denial of service, information disclosure, and data tampering.

Best practice: Disable all features in the UEFI and OS, that are not used. This reduces the attack surface.
Configure your system to only execute signed code and signed kernel modules, if possible.

Official announcement: For details, please refer to link – https://nvidia.custhelp.com/app/answers/detail/a_id/5461

CVE-2023-22886: Improper Input Validation vulnerability in Apache Software Foundation Apache Airflow JDBC Provider (30th June 2023)

Preface: Airflow is a platform to programmatically author, schedule, and monitor workflows. Specifically, it is used in Machine Learning to create pipelines.

Background: Apache Airflow™ is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. This open-source platform most suitable for pipelines that change slowly, are related to a specific time interval, or are pre-scheduled. It’s a popular solution that many data engineers rely on for building their data pipelines. Data pipelines work with ongoing data streams in real time. It’s been used to run SQL, machine learning models, and more.

Apache Airflow is a Python-based platform to programmatically author, schedule and monitor workflows. It is well-suited to machine learning for building pipelines, managing data and training models.

You can use Apache Airflow to schedule pipelines that extract data from multiple sources, and run Spark jobs or other data transformations. Machine learning model training.

Vulnerability details: Improper Input Validation vulnerability in Apache Software Foundation Apache Airflow JDBC Provider. Airflow JDBC Provider Connection’s [Connection URL] parameters had no restrictions, which made it possible to implement RCE attacks via different type JDBC drivers, obtain airflow server permission. This issue affects Apache Airflow JDBC Provider: before 4.0.0.

Recommendation: For security purposes, you should avoid building the connection URLs based on user input. For user name and password values, use the connection property collections. Restrict direct usage of driver params via extras for JDBC connection.

Remedy: To configure driver parameters (driver path and driver class), you can use the following methods:

  1. Supply them as constructor arguments when instantiating the hook.
  2. Set the “driver_path” and/or “driver_class” parameters in the “hook_params” dictionary when creating the hook using SQL operators.
  3. Set the “driver_path” and/or “driver_class” extra in the connection and correspondingly enable the “allow_driver_path_in_extra” and/or “allow_driver_class_in_extra” options in the “providers[.jdbc” section of the Airflow configuration.
  4. Patch the “JdbcHook.default_driver_path” and/or “JdbcHook.default_driver_class” values in the “local_settings[.]py” file.

Official announcement: For details, please refer to the link – https://github.com/advisories/GHSA-mm87-c3x2-6f89