Category Archives: System

Supply constraints and product attribute design. It is expected that two camps will be operated in the future. (9th JAN 2024)

Preface: When High performance cluster (HPC) was born, it destiny do a competition with traditional mainframe technology.  The major component of HPC is the multicore processor. That is GPU. For example: The NVIDIA GA100 GPU is composed of multiple GPU Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), and HBM2 memory controllers. Compare with the best of the best setup,  the world’s fastest public supercomputer, Frontier, has 37,000 AMD Instinct 250X GPUs.

How to break through traditional computer technology and go from serial to parallel processing: CPUs are fast, but they work by quickly executing a series of tasks, which requires a lot of interactivity. This is known as serial processing. GPU parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. As time goes by. Until the revolution in GPU processor technology and high-performance clusters. RedHat created a High Performance Cluster system configuration. The overall performance is close to that of a supercomputer processor using crossbar switches. But the bottleneck lies in how to transform traditional software applications from serial processing to parallel processing.

Reflection of reality in the technological world: A common consense that GPU processor manufacturer Nvidia had strong market share in the world. The Nvidia A100 processor delivers strong performance on intensive AI tasks and deep learning. A more budget-friendly option, the H100 can be preferred for graphics-intensive tasks. The H100’s optimizations, such as TensorRT-LLM and NVLink, show that it surpasses the A100, especially in the LLM area. Large Language Models (LLMs) have revolutionised the field of natural language processing. As these models grow in size and complexity, the computational demands for inference also increase significantly. To tackle this challenge, leveraging multiple GPUs becomes essential.

Supply constraints and product attribute design create headaches for web hosting providers: CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). But converting serial C code to data parallel code is a difficult problem. Because of this limitation. Nvidia develop NVIDIA CUDA Compiler (NVCC). This software is a proprietary compiler by Nvidia intended for use with CUDA.

Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python.

But you cannot use CUDA without a Nvidia Graphics Card. CUDA is a framework developed by Nvidia that allows people with a Nvidia Graphics Card to use GPU acceleration when it comes to deep learning, and not having a Nvidia graphics card defeats that purpose. (Refer to attached Diagram Part 1).

If web hosting service provider not use NVIDIA product, is it possible to use other brand GPU processor for AI machine learning? Yes, you can select OpenCilk.

OpenCilk (http://opencilk.org) is a new open-source platform to support task-parallel programming in C/C++. (Refer to attached Diagram Part 2)

Referring to the above details, the technological development atmosphere makes people foresee that two camps will operate in the future. This is the Nvidia camp and the non-Nvidia camp. This is why I have observed that web hosting service providers are giving themselves headaches in this new trend in technology gaming.

CVE-2023-34326: Potential risk allowing access to unindented memory regions (8th JAN 2024)

Preface: In fact, by the time the vulnerability was released to the public, the design limitations and/or flaws had already been fixed. You may ask, what is the discussion space for the discovered vulnerabilities? As you know, an increasing number of vendors remain compliant with CVE policies, but the technical details will not be disclosed. If your focus is understanding, even if the vendor doesn’t release any details. You can learn about specific techniques as you learn. The techniques you learn can expand your horizons.

Background: AMD-Vi represents an I/O memory management unit (IOMMU) that is embedded in the chipset of the AMD Opteron 6000 Series platform. IOMMU is a key technology in extending the CPU’s virtual memory to GPUs to enable heterogeneous computing. AMD-Vi (also known as AMD IOMMU) to allow for PCI Passthrough.

DMA mapping is a conversion from virtual addressed memory to a memory which is DMA-able on physical addresses (actually bus addresses).

DMA remapping maps virtual addresses in DMA operations to physical addresses in the processor’s memory address space. Similar to MMU, IOMMU uses a multi-level page table to keep track of the IOVA-to-PA mappings at different page-size granularity (e.g., 4-KiB, 2-MiB, and 1-GiB pages). The hardware also implements a cache (aka IOTLB) of page table entries to speed up translations.

AMD processors use two distinct IOTLBs for caching Page Directory Entry (PDE) and Page Table Entry (PTE) (AMD, 2021; Kegel et al., 2016).

Ref: If your application scenario does not require virtualization, then disable AMD Virtualization Technology. With virtualization disabled, also, disable AMD IOMMU. It can cause differences in latency for memory access. Finally, also disable SR-IOV.

Vulnerability details: The caching invalidation guidelines from the AMD-Vi specification (48882—Rev 3.07-PUB—Oct 2022) is incorrect on some hardware, as devices will malfunction (see stale DMA mappings) if some fields of the DTE are updated but the IOMMU TLB is not flushed. Such stale DMA mappings can point to memory ranges not owned by the guest, thus allowing access to unindented memory regions.

Official announcement: Please refer to the link for details – https://nvd.nist.gov/vuln/detail/CVE-2023-34326

SUSE Enterprise Linux Server 15: Apart from libvirt framework , how to manages memory in units called pages? (25-10-2023)

Preface: HPE Cray OS Based on standard SUSE Enterprise Linux Server 15. A supercomputer, dubbed Frontier, was developed by HPE Cray. Frontier and HPE Cray OS to run standard Linux applications, but rather enhance it for performance, scale, and reliability.

Ref: Frontier is based on the latest HPE Cray EX235a architecture and equipped with AMD EPYC 64C 2GHz processors. The system has 8,699,904 total cores, a power efficiency rating of 52.59 gigaflops/watt, and relies on Slingshot-11 interconnect for data transfer.  

SUSE Enterprise Linux Server 15: How to manages memory in units called pages?

Linux manages memory in units called pages (default page size is 4 KB). Linux and the CPU need to know which pages belong to which process. Those parameters stored in a page table. If high volume of processes are running, it takes more time to fnd where the memory is mapped, because of the time required to search the page table. To speed up the search, the TLB (Translation Lookaside Buer) was invented. But on a system with a lot of memory, the TLB is not enough.

To avoid any fallback to normal page table (resulting in a cache miss, which is time consuming), huge pages can be used. Using huge pages will reduce TLB overhead and TLB misses (pagewalk).

Example: A host with 32 GB (32*1014*1024 = 33,554,432 KB) of memory and a 4 KB page size has a TLB with 33,554,432/4 = 8,388,608 entries. Using a 2 MB (2048 KB) page size, the TLB only has 33554432/2048 = 16384 entries, considerably reducing the TLB misses.

About CVE-2023-31248 & CVE-2023-35001: CAP_NET_ADMIN is in any user or network namespace. Does it have impact to downstream vendor? (6th July 2023)

Preface: CAP_NET_ADMIN is in any user or network namespace.

Background: The “Capabilities” mechanism was introduced after the Linux kernel 2.2. If the “Capabilities” setting is incorrect,
it will give attackers an opportunity to achieve privilege escalation. Linux capabilities provide a subset of the available root privileges to a process.
Starting from the Linux-2.1 kernel, the concept of capability is introduced to achieve grained access control.
You can find the capabilities defined in /usr/include/linux/capability[.]h (see below):
CAP_CHOWN 0 allows changing file ownership
CAP_DAC_OVERRIDE 1 ignores all DAC access restrictions on the file
CAP_DAC_READ_SEARCH 2 Ignore all restrictions on read and search operations
CAP_FOWNER 3 If the file belongs to the UID of the process, cancel the restriction on the file
CAP_FSETID 4 Allows setting the setuid bit
.
CAP_NET_ADMIN 12 Allows performing network administration tasks: interfaces, firewalls, routing, etc.

Vulnerability details:
CVE-2023-31248 Linux Kernel nftables Use-After-Free Local Privilege Escalation Vulnerability;
nft_chain_lookup_byid() failed to check whether a chain was active and CAP_NET_ADMIN is in any user or network namespace.
For details, please refer to the link – https://www.tenable.com/cve/CVE-2023-31248

CVE-2023-35001 Linux Kernel nftables Out-Of-Bounds Read/Write Vulnerability; nft_byteorder poorly handled vm register contents when CAP_NET_ADMIN is in any user or network namespace

Focus on CVE-2023-31248
This is due to the nft_chain_lookup_by id ignoring the genmask.
Remark: The Genmask field is the bit mask that IP applies to the destination address from the packet to see if the address matches the destination value in the table.
If a bit is on in the bit mask, the corresponding bit in the destination address is significant for matching the address.
Once the first table is removed, all the member objects, as well as the table itself, are kfree()’d, but the references will be kept in the second table, so it encountered a use-after-free condition.

About CVE-2023-29345 and CVE-2023-33143, Microsoft released Security Updated of the Chromium project (6th June 2023)

Preface: Windows has traditionally run on machines that are powered by x86 / x64 processors. Windows 11 adds the capability to run unmodified x64 Windows apps on Arm devices! This capability to run x86 & x64 apps on Arm devices gives end-users confidence that the majority of their existing apps & tools will run well even on new Arm-powered devices. For the best of result, it can exploit Arm-native Windows apps theoretically, as a result, developers cope with trend , thus built or port Arm-native Windows apps.


Background: Codenamed “Anaheim”, on December 6, 2018, Microsoft announced its intent to base Edge on the Chromium source code, using the same browser engine as Google Chrome but with enhancements developed by Microsoft. The new Microsoft Edge (Chromium) is built on the same underlying technology as Google Chrome. During the Ignite 2021 conference, Microsoft revealed plans to align the codebase of the Edge browser on all supported platforms.


Vulnerability details:
CVE-2023-29345 Microsoft Edge Remote Code Execution – A vulnerability was found in Microsoft Edge (Web Browser) (version unknown).
CVE-2023-33143 – Microsoft Edge (Chromium-based) Elevation of Privilege Vulnerability
For details, please refer to the link – https://learn.microsoft.com/en-us/deployedge/microsoft-edge-relnotes-security

Dig out details on CVE-2023-20877 – VMware fixed this matter already. (18th May 2023)

Preface: If you have set “Read& Execute“ permission to everyone on parent folder, and not disable inheritance permission on the subfolder, in theory, subfolder permission should inherit the parent folder permission.

Background: VMware Aria Operations is a unified, AI-powered self-driving IT operations management platform for private, hybrid & multi-cloud environments.
You can execute scripts from the local OS using Orchestrator. To do that, Orchestrator needs access (x) to the folder where the script is located and the Orchestrator user needs to be able to read and execute (rx) it. You also need to allow Orchestrator to execute local files.
The access for Orchestrator is regulated by the entries in the js-io-rights[.]conf file.
Please note that the script needs to be in a file location that Orchestrator can access and that Orchestrator will run as user vco with the group vco.
Orchestrator has full access preconfigured for the /var/run/vco directory. The operator means that Orchestrator has the right to access the directory, for example, to list the content or to execute a file.

Vulnerability details: VMware Aria Operations contains a privilege escalation vulnerability. An authenticated malicious user with ReadOnly privileges can perform code execution leading to privilege escalation.

Official announcement: Check out the details on the link – https://www.vmware.com/security/advisories/VMSA-2023-0009.html

Linux kernel BUG: About hugetlb[.]c in mm folder (22nd Mar 2023)

Preface: Enabling HugePages makes it possible for the operating system to support memory pages greater than the default (usually 4 KB). Using very large page sizes can improve system performance by reducing the amount of system resources required to access page table entries.

Background: For Red Hat Enterprise Linux systems, it is recommend configure HugeTLB pages to guarantee that JBoss EAP processes will have access to large pages.
Reminder: Activating large pages for JBoss EAP JVMs results in pages that are locked in memory and cannot be swapped to disk like regular memory.

Ref: Hugetlb boot command line parameter semantics hugepagesz. Specify a huge page size. Used in conjunction with hugepages parameter to preallocate a number of huge pages of the specified size. Hence, hugepagesz and hugepages are typically specified in pairs such as: hugepagesz=2M hugepages=512.

Design weakness: The special hugetlb routine called at fork took care of structure updates at fork time. However, vma_splitting is not properly handled for ipc shared memory mappings backed by hugetlb pages. This can result in a “kernel NULL pointer dereference” BUG or use after free as two vmas point to the same lock structure.

Solution: Update the shm open and close routines to always call the underlying open and close routines.
For Redhat Linux, do the kernel update from 6.1.18-100.fc36 to 6.2.7-1000.fc36.

Technical reference: A subroutine IOBUFSET is provided to craved up an arbitrarily sized storage area into perforated buffer blocks with space for 132 data bytes. The beginning and ending addresses of the buffer storage area are specified to IOBUFSET in age A- and B-registers, respectively.

Have you upgraded your Linux kernel? (15th Mar 2023)

Preface: Blue screen of death (BSOD) is error display on Windows commonly. In Linux, it is unlikely and uncommon, but is it possible?

Background: As the only copyright holder to the GPL-covered components of the software, you are free to add exceptions and additional terms to the GPLv3, as described in section 7 of that license. In fact, the LGPLv3 is just such a GPLv3 section 7 additional permission, allowing the component to be linked to proprietary code. But it is not recommended. Because it is extreme tricky.

The kernel marks itself as “tainted” when some event occurs that may be relevant when investigating the problem. Found that Kernel 6.1.16 was apparently subject to “oops”. What is “oops”? See below:
The tainted status is printed when a kernel internal problem (“kernel bug”), recoverable error (“kernel oops”), or unrecoverable error (“kernel panic”) occurs, and debug information about this is written to the log dmesg output. The tainted status can also be checked at runtime via files in /proc/.

Solution: Maybe it has nothing to do with serious cyberattacks. But it is recommended to upgrade the kernel . 6.2.5 and 6.1.18 has been updated

ndctl: release v76.1, have you update yet? (14th Mar 2023)

Preface: Preface: Advantages of NVDIMMs in servers. NVDIMMs provide high-speed DRAM performance coupled with flash-backed persistent storage. Aside from providing an additional memory tier in servers, NVDIMM persistence allows applications to continue processing I/O traffic during planned or unexpected system failures.

Background: Persistent Memory (PM) is a type of Non-Volatile Memory (NVM). The ndctl utility is used to manage the libnvdimm (non-volatile memory device) sub-system in the Linux Kernel. It is required for several Persistent Memory Developer Kit (PMDK) features if compiling from source. If ndctl is not available, the PMDK may not build all components and features.
Utility library for managing the libnvdimm (non-volatile memory device) sub-system in the Linux kernel
If you going to Writing Applications for Persistent Memory. Below details is the Programming Model Modes:

Block and File modes use IO

  • Data is read or written using RAM buffers
  • Software controls how to wait (context switch or poll)
  • Status is explicitly checked by software

Volume and PM modes enable Load/Store

  • Data is loaded into or stored from processor registers
  • Processor makes software wait for data during instruction
  • No status checking – errors generate exceptions

Recommendation: Suggest upgrade to ndctl: release v76.1
Version 76.1 Fixed the following:
cxl/event-trace: use the wrapped util_json_new_u64()
cxl/monitor: fix include paths for tracefs and traceevent
cxl/monitor: Make libtracefs dependency optional

hiccup, web server load balancing solution  3rd May 2022

Preface: Online banking cannot lack of load balancing solution today. However in terms of life cycle of operation system and software libaries , Java language development platform and on-demand custom fuctions. Does it bother the load balancing functions? The most challenging parts is the layer 7 load balancing. Perhaps you can do the healt check on appliation functions. However, it is difficult to garantee the non stop function on application side (availability).

Background: The load-balancing algorithms supported are round-robin, weighted least request, random, ring-hash and more. An additional function includes client non interrupt services using application & service availability (health-checks performance).

My focus: Online banking platform (Hong Kong)
Error 500: java.lang.RuntimeException: no EntityContext found in existing session
Date: Around 8:15 am 5/3/2022

Fundamentally, Web server load balancing function in correct way make no downtime. Therefore when you connected to web server had problem. The load balancing function will keep persistence (SSL Sticky) then redirect your connection to the web server which is available.
My experience operating in online banking system in today morning (3rd May, 2022) hints the technical information to me.
I encountered error my web services. (Reminded – it successful logged on and doing operations). However an error 500 display on my screen. Thereafter. even I close the browser, make new established connection to Banking system. It still redirect my new connection to e-banking1.hangseng.com. But in round robin setup architecture, I can connect to e-banking2.hangseng.com by chance.

Observation: Perhaps, load balancer capable web application health check function. But for online banking system, it do a health check on web server front page. On java server page. For example: The EntityContext interface contains the getEjbObject and getPrimaryKey methods that a bean can use to find out about the object it is associated with. The client communicates with the bean via the EJBObject. If one of the java service had error occur. May be the load balancer health check function not know what’s happening.

Whether there is concerns on vulnerable Java SE Embedded versions. So,  apply tight protection and causes this technical problem occurs. Or there is an software configuration problem in web application itself?