Category Archives: Application Development

Pushing open source development concept into space (27th Dec 2024)

Preface: We live in a three-dimensional world. We move in space, left or right, forward or backward, up or down. Furthermore, living things do not live forever. Hardware and software also have life cycles. Human beings seem to be destined to live on earth. There are eight planets in the solar system that are not suitable for human survival. Rockets travel through the atmosphere to explore space. The time required is unknown, and there is no absolute answer to whether the target will be found. In space, the unit of distance is light years. From one planet to another. It requires at least a lifetime of human dedication. I assume that the AI ​​collects all existing data collected by SpaceX for analysis, and if the AI ​​cannot completely open the secret door of the Einstein-Rosen Bridge (for time travel), maybe he will stay on Earth.

Technical focus: For computers to survive in space, they must be hardened — made of resilient materials and designed to withstand high doses of radiation. But to make a computer fit for space takes years. Satellite manufacturers therefore often have to make do with rather obsolete processors.

About software development: Java has become one of the most widely used programming languages across various industries, including space exploration. At NASA, Java is used for developing highly interactive systems, mission-critical software, and user interfaces that support space operations.

Ref: Java Pathfinder (JPF) is a model checker for Java. The technology takes a Java program and “executes” it in a way that explores all possible executions/interleavings of the threads in the program. This allows JPF to detect certain bugs (e.g., deadlocks and assertion violations) that may be missed during testing.

About the topic: Antmicro & AetheroSpace launched  Zephyr IoT into space in SpaceX’s. Aethero has recently announced a groundbreaking collaboration with Antmicro, a leading technology company specializing in open source tools, to develop cutting-edge edge AI hardware tailored for space applications.

Antmicro played a crucial role in providing the software foundation for the NxN Edge Computing Module, contributing both Linux and Zephyr RTOS software for controlling the payload. Additionally, Antmicro implemented their open source RDFM framework, enabling modular, configurable, multi-OS device OTA updates and fleet management through Aethero’s user portal.

For details about Antmicro, please refer to link below: https://hardwarebee.com/electronic-breaking-news/aethero-and-antmicro-collaborate-on-open-source-space-edge-ai-design/

Are you still a fan of Nvidia? Or do you support AMD now? (23rd Dec 2024)

Preface: In the zone artificial intelligence (AI), NVIDIA and AMD are leading the way, pushing the limits of computing power. Both companies have launched powerful AI chips, but the comparison between the H100 and MI250X raises the question of superiority.

Background: What is AMD Instinct MI250X? AMD Instinct™ MI250X Series accelerators are uniquely suited to power even the most demanding AI and HPC workloads, delivering exceptional compute performance, massive memory density, high-bandwidth memory, and support for specialised data formats.

AMD now has more computing power than Nvidia in the Top500. Five systems use AMD processors (El Capitan, Frontier, HPC6, LUMI, and Tuolumne) while three systems use Intel (Aurora, Eagle, Leonardo).

Software Stack: ROCm offers a suite of optimizations for AI workloads from large language models (LLMs) to image and video detection and recognition, life sciences and drug discovery, autonomous driving, robotics, and more. ROCm supports the broader AI software ecosystem, including open frameworks, models, and tools.

HIP is a thin API with little or no performance impact over coding directly in NVIDIA CUDA or AMD ROCm.

HIP enables coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, and more.

Developers can specialize for the platform (CUDA or ROCm) to tune for performance or handle tricky cases.

Ref:  What is the difference between ROCm and hip?

ROCm™ is AMD’s open source software platform for GPU-accelerated high performance computing and machine learning. HIP is ROCm’s C++ dialect designed to ease conversion of CUDA applications to portable C++ code.

Official article: Please refer to the link for details

https://www.amd.com/en/products/accelerators/instinct/mi200/mi250x.html

CVE-2024-0132: About NVIDIA Container Toolkit 1.16.1 or earlier contains a Time-of-check Time-of-Use (TOCTOU) vulnerability (25th Sep 2024)

Preface: In software development, time-of-check to time-of-use (TOCTOU, TOCTTOU or TOC/TOU) is a class of software bugs caused by a race condition involving the checking of the state of a part of a system (such as a security credential) and the use of the results of that check.

Background: The NVIDIA container stack is architected so that it can be targeted to support any container runtime in the ecosystem. The components of the stack include:

-The NVIDIA Container Runtime (nvidia-container-runtime)

-The NVIDIA Container Runtime Hook (nvidia-container-toolkit / nvidia-container-runtime-hook)

-The NVIDIA Container Library and CLI (libnvidia-container1, nvidia-container-cli)

The components of the NVIDIA container stack are packaged as the NVIDIA Container Toolkit.

The NVIDIA Container Toolkit is a key component in enabling Docker containers to leverage the raw power of NVIDIA GPUs. This toolkit allows for the integration of GPU resources into your Docker containers.

Vulnerability details: NVIDIA Container Toolkit 1.16.1 or earlier contains a Time-of-check Time-of-Use (TOCTOU) vulnerability when used with default configuration where a specifically crafted container image may gain access to the host file system. This does not impact use cases where CDI is used. A successful exploit of this vulnerability may lead to code execution, denial of service, escalation of privileges, information disclosure, and data tampering.

Official announcement: Please refer to the vendor announcement for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5582

CVE-2024-34731: last week CVEs, today story. (29th Aug 2024)

Preface: A race condition vulnerability is a software bug that allows these unexpected results to be exploited by malicious entities.

The Race condition is a privilege escalation vulnerability that manipulates the time between imposing a security control and using services in a UNIX like system. This vulnerability is a result of interferences caused by multiple sequential threads running in the system and sharing the same resources.

Background: TranscodingResourcePolicy is a component of the Android platform/frameworks/av package that manages resource policies for transcoding operations. Transcoding is the process of converting media files from one format to another.

Vulnerability details: In multiple functions of TranscodingResourcePolicy.cpp, there is a possible memory corruption due to a race condition. This could lead to local escalation of privilege with no additional execution privileges needed. User interaction is not needed for exploitation.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2024-34731

RHSA-2024-4982 -Security Advisory- OpenShift API for Data Protection (OADP) – Security Fix – golang: net/netip – CVE-2024-24790 (2nd Aug 2024)

Preface: The IPv4-mapped IPv6 address format allows the IPv4 address of an IPv4 node to be represented as an IPv6 address. The IPv4 address is encoded into the low-order 32 bits of the IPv6 address, and the high-order 96 bits hold the fixed prefix 0:0:0:0:0:FFFF.

Background: OpenShift API for Data Protection (OADP) enables you to back up and restore application resources, persistent volume data, and internal container images to external backup storage. OADP enables both file system-based and snapshot-based backups for persistent volumes.

Package netip defines an IP address type that’s a small value type. Building on that Addr type, the package also defines AddrPort (an IP address and a port) and Prefix (an IP address and a bit length prefix).

Compared to the net.IP type, Addr type takes less memory, is immutable, and is comparable (supports == and being a map key).

Vulnerability details: OpenShift API for Data Protection (OADP) enables you to back up and restore application resources, persistent volume data, and internal container images to external backup storage. OADP enables both file system-based and snapshot-based backups for persistent volumes.

Security Fixes from Bugzilla: golang: net/netip: Unexpected behavior from Is methods for IPv4-mapped IPv6 addresses (CVE-2024-24790)

Official announcement: Please refer to the website for details – https://access.redhat.com/errata/RHSA-2024:4982

Regarding CVE-2024-0108: The manufacturer did not describe much. Is the situation below exactly what CVE mentioned? (25/07/2024)

Preface: What is an example of autonomous AI?

Autonomous intelligence is artificial intelligence (AI) that can act without human intervention, input, or direct supervision. It’s considered the most advanced type of artificial intelligence. Examples may include smart manufacturing robots, self-driving cars, or care robots for the elderly.

Background: What is Jetson AGX Xavier used for?

As the world’s first computer designed specifically for autonomous machines, Jetson AGX Xavier has the performance to handle the visual odometry, sensor fusion, localization and mapping, obstacle detection, and path-planning algorithms that are critical to next-generation robots.

Vulnerability details: NVIDIA Jetson Linux contains a vulnerability in NvGPU where error handling paths in GPU MMU mapping code fail to clean up a failed mapping attempt. A successful exploit of this vulnerability may lead to denial of service, code execution, and escalation of privileges.

Official announcement: Please refer to the official announcement for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5555

CVE-2024-6960: H2O Model Deserialization RCE (21st July 2024)

Preface: TensorFlow provides a flexible framework for deep learning tasks, but may not be as optimized as H2O for handling large datasets.

Background: H2O uses Iced classes as the primary means of moving Java Objects around the cluster.

Auto-serializer base-class using a delegator pattern (the faster option is to byte-code gen directly in all Iced classes, but this requires all Iced classes go through a ClassLoader).

Iced is a marker class, and Freezable is the companion marker interface. Marked classes have 2-byte integer type associated with them, and an auto-genned delegate class created to actually do byte-stream and JSON serialization and deserialization. Byte-stream serialization is extremely dense (includes various compressions), and typically memory-bandwidth bound to generate.

Vulnerability details: The H2O machine learning platform uses “Iced” classes as the primary means of moving Java Objects around the cluster. The Iced format supports inclusion of serialized Java objects. When a model is deserialized, any class is allowed to be deserialized (no class whitelist). An attacker can construct a crafted Iced model that uses Java gadgets and leads to arbitrary code execution when imported to the H2O platform.

Official announcement: Please refer to the official announcement for details – https://nvd.nist.gov/vuln/detail/CVE-2024-6960

A critical step in exploiting a buffer overflow is determining the offset where important program control information is overwritten. In the Linux kernel, the (CVE-2024-41011) vulnerability has been resolved. (18-07-2024)

Preface: The PAGE_SIZE macro defined in the Linux kernel source determines the page size. Its definition is in the kernel header file /usr/src/kernels/5.14[.] 0-22. el9[.] x86_64/include/asm-generic/page.

Background: MMIO stands for Memory-Mapped Input/Output. In Linux, MMIO is a mechanism used by devices to interface with the CPU that involves mapping their control registers and buffers directly into the processor’s memory address space.

This enables the CPU to access device registers and exchange data with devices using load and store instructions, just as if they were conventional memory locations. Graphics cards, network interfaces, and storage controllers all employ MMIO to effectively conduct input and output tasks.

Vulnerability details: drm/amdkfd: don’t allow mapping the MMIO HDP page with large pages We don’t get the right offset in that case. The GPU has an unused 4K area of the register BAR space into which you can remap registers. We remap the HDP flush registers into this space to allow userspace (CPU or GPU) to flush the HDP when it updates VRAM. However, on systems with >4K pages, we end up exposing PAGE_SIZE of MMIO space.

Official announcement: Please refer to the official announcement for details – https://nvd.nist.gov/vuln/detail/CVE-2024-41011

CVE-2024-0102:  About NVIDIA® CUDA® Toolkit. If you remember, a similar incident happened in April of this year. Believe this is a weakness of similar designs. (11 July 2024)

Preface: OpenAI revealed that the project cost $100 million, took 100 days, and used 25,000 NVIDIA A100 GPUs. Each server equipped with these GPUs uses approximately 6.5 kW, so an estimated 50 GWh of energy is consumed during training.

Background: Parallel processing is a method in computing of running two or more processors (CPUs) to handle separate parts of an overall task. Breaking up different parts of a task among multiple processors will help reduce the amount of time to run a program. GPUs render images more quickly than a CPU because of its parallel processing architecture, which allows it to perform multiple calculations across streams of data simultaneously. The CPU is the brain of the operation, responsible for giving instructions to the rest of the system, including the GPU(s).

NVIDIA CUDA provides a simple C/C++ based interface. The CUDA compiler leverages parallelism built into the CUDA programming model as it compiles your program into code.
CUDA is a parallel computing platform and programming interface model created by Nvidia for the development of software which is used by parallel processors. It serves as an alternative to running simulations on traditional CPUs.

Vulnerability details: NVIDIA CUDA Toolkit for all platforms contains a vulnerability in nvdisasm, where an attacker can cause an out-of-bounds read issue by deceiving a user into reading a malformed ELF file. A successful exploit of this vulnerability might lead to denial of service.

Official announcement: Please refer to the vendor announcement for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5548

Get closer look CVE-2024-39920: About “SnailLoad” issue (5-Jul-2024)

NVD Published Date: 07/03/2024

Preface: How is RTT measured in TCP? Measures the time from sending a packet to getting an acknowledgment packet from the target host.

Background: A new technology standard called “RFC 9293” was released on August 18, 2022.

Highlight:

-Acknowledgment Number:  32 bits – If the ACK control bit is set, this field contains the value of the next sequence number the sender of the segment is expecting to receive.  Once a connection is established, this is always sent.

-There are also methods of “fingerprinting” that can be used to infer the host TCP implementation (operating system) version or platform
information. These collect observations of several aspects, such as
the options present in segments, the ordering of options, the
specific behaviors in the case of various conditions, packet timing,
packet sizing, and other aspects of the protocol that are left to be
determined by an implementer, and can use those observations to
identify information about the host and implementation.

Vulnerability details: The TCP protocol in RFC 9293 has a timing side channel that makes it easier for remote attackers to infer the content of one TCP connection from a client system (to any server), when that client system is concurrently obtaining TCP data at a slow rate from an attacker-controlled server, aka the “SnailLoad” issue. For example, the attack can begin by measuring RTTs via the TCP segments whose role is to provide an ACK control bit and an Acknowledgment Number.

Official announcement: For detail, please refer to link – https://nvd.nist.gov/vuln/detail/CVE-2024-39920