Category Archives: System

SUSE Enterprise Linux Server 15: Apart from libvirt framework , how to manages memory in units called pages? (25-10-2023)

Preface: HPE Cray OS Based on standard SUSE Enterprise Linux Server 15. A supercomputer, dubbed Frontier, was developed by HPE Cray. Frontier and HPE Cray OS to run standard Linux applications, but rather enhance it for performance, scale, and reliability.

Ref: Frontier is based on the latest HPE Cray EX235a architecture and equipped with AMD EPYC 64C 2GHz processors. The system has 8,699,904 total cores, a power efficiency rating of 52.59 gigaflops/watt, and relies on Slingshot-11 interconnect for data transfer.  

SUSE Enterprise Linux Server 15: How to manages memory in units called pages?

Linux manages memory in units called pages (default page size is 4 KB). Linux and the CPU need to know which pages belong to which process. Those parameters stored in a page table. If high volume of processes are running, it takes more time to fnd where the memory is mapped, because of the time required to search the page table. To speed up the search, the TLB (Translation Lookaside Buer) was invented. But on a system with a lot of memory, the TLB is not enough.

To avoid any fallback to normal page table (resulting in a cache miss, which is time consuming), huge pages can be used. Using huge pages will reduce TLB overhead and TLB misses (pagewalk).

Example: A host with 32 GB (32*1014*1024 = 33,554,432 KB) of memory and a 4 KB page size has a TLB with 33,554,432/4 = 8,388,608 entries. Using a 2 MB (2048 KB) page size, the TLB only has 33554432/2048 = 16384 entries, considerably reducing the TLB misses.

About CVE-2023-31248 & CVE-2023-35001: CAP_NET_ADMIN is in any user or network namespace. Does it have impact to downstream vendor? (6th July 2023)

Preface: CAP_NET_ADMIN is in any user or network namespace.

Background: The “Capabilities” mechanism was introduced after the Linux kernel 2.2. If the “Capabilities” setting is incorrect,
it will give attackers an opportunity to achieve privilege escalation. Linux capabilities provide a subset of the available root privileges to a process.
Starting from the Linux-2.1 kernel, the concept of capability is introduced to achieve grained access control.
You can find the capabilities defined in /usr/include/linux/capability[.]h (see below):
CAP_CHOWN 0 allows changing file ownership
CAP_DAC_OVERRIDE 1 ignores all DAC access restrictions on the file
CAP_DAC_READ_SEARCH 2 Ignore all restrictions on read and search operations
CAP_FOWNER 3 If the file belongs to the UID of the process, cancel the restriction on the file
CAP_FSETID 4 Allows setting the setuid bit
.
CAP_NET_ADMIN 12 Allows performing network administration tasks: interfaces, firewalls, routing, etc.

Vulnerability details:
CVE-2023-31248 Linux Kernel nftables Use-After-Free Local Privilege Escalation Vulnerability;
nft_chain_lookup_byid() failed to check whether a chain was active and CAP_NET_ADMIN is in any user or network namespace.
For details, please refer to the link – https://www.tenable.com/cve/CVE-2023-31248

CVE-2023-35001 Linux Kernel nftables Out-Of-Bounds Read/Write Vulnerability; nft_byteorder poorly handled vm register contents when CAP_NET_ADMIN is in any user or network namespace

Focus on CVE-2023-31248
This is due to the nft_chain_lookup_by id ignoring the genmask.
Remark: The Genmask field is the bit mask that IP applies to the destination address from the packet to see if the address matches the destination value in the table.
If a bit is on in the bit mask, the corresponding bit in the destination address is significant for matching the address.
Once the first table is removed, all the member objects, as well as the table itself, are kfree()’d, but the references will be kept in the second table, so it encountered a use-after-free condition.

About CVE-2023-29345 and CVE-2023-33143, Microsoft released Security Updated of the Chromium project (6th June 2023)

Preface: Windows has traditionally run on machines that are powered by x86 / x64 processors. Windows 11 adds the capability to run unmodified x64 Windows apps on Arm devices! This capability to run x86 & x64 apps on Arm devices gives end-users confidence that the majority of their existing apps & tools will run well even on new Arm-powered devices. For the best of result, it can exploit Arm-native Windows apps theoretically, as a result, developers cope with trend , thus built or port Arm-native Windows apps.


Background: Codenamed “Anaheim”, on December 6, 2018, Microsoft announced its intent to base Edge on the Chromium source code, using the same browser engine as Google Chrome but with enhancements developed by Microsoft. The new Microsoft Edge (Chromium) is built on the same underlying technology as Google Chrome. During the Ignite 2021 conference, Microsoft revealed plans to align the codebase of the Edge browser on all supported platforms.


Vulnerability details:
CVE-2023-29345 Microsoft Edge Remote Code Execution – A vulnerability was found in Microsoft Edge (Web Browser) (version unknown).
CVE-2023-33143 – Microsoft Edge (Chromium-based) Elevation of Privilege Vulnerability
For details, please refer to the link – https://learn.microsoft.com/en-us/deployedge/microsoft-edge-relnotes-security

Dig out details on CVE-2023-20877 – VMware fixed this matter already. (18th May 2023)

Preface: If you have set “Read& Execute“ permission to everyone on parent folder, and not disable inheritance permission on the subfolder, in theory, subfolder permission should inherit the parent folder permission.

Background: VMware Aria Operations is a unified, AI-powered self-driving IT operations management platform for private, hybrid & multi-cloud environments.
You can execute scripts from the local OS using Orchestrator. To do that, Orchestrator needs access (x) to the folder where the script is located and the Orchestrator user needs to be able to read and execute (rx) it. You also need to allow Orchestrator to execute local files.
The access for Orchestrator is regulated by the entries in the js-io-rights[.]conf file.
Please note that the script needs to be in a file location that Orchestrator can access and that Orchestrator will run as user vco with the group vco.
Orchestrator has full access preconfigured for the /var/run/vco directory. The operator means that Orchestrator has the right to access the directory, for example, to list the content or to execute a file.

Vulnerability details: VMware Aria Operations contains a privilege escalation vulnerability. An authenticated malicious user with ReadOnly privileges can perform code execution leading to privilege escalation.

Official announcement: Check out the details on the link – https://www.vmware.com/security/advisories/VMSA-2023-0009.html

Linux kernel BUG: About hugetlb[.]c in mm folder (22nd Mar 2023)

Preface: Enabling HugePages makes it possible for the operating system to support memory pages greater than the default (usually 4 KB). Using very large page sizes can improve system performance by reducing the amount of system resources required to access page table entries.

Background: For Red Hat Enterprise Linux systems, it is recommend configure HugeTLB pages to guarantee that JBoss EAP processes will have access to large pages.
Reminder: Activating large pages for JBoss EAP JVMs results in pages that are locked in memory and cannot be swapped to disk like regular memory.

Ref: Hugetlb boot command line parameter semantics hugepagesz. Specify a huge page size. Used in conjunction with hugepages parameter to preallocate a number of huge pages of the specified size. Hence, hugepagesz and hugepages are typically specified in pairs such as: hugepagesz=2M hugepages=512.

Design weakness: The special hugetlb routine called at fork took care of structure updates at fork time. However, vma_splitting is not properly handled for ipc shared memory mappings backed by hugetlb pages. This can result in a “kernel NULL pointer dereference” BUG or use after free as two vmas point to the same lock structure.

Solution: Update the shm open and close routines to always call the underlying open and close routines.
For Redhat Linux, do the kernel update from 6.1.18-100.fc36 to 6.2.7-1000.fc36.

Technical reference: A subroutine IOBUFSET is provided to craved up an arbitrarily sized storage area into perforated buffer blocks with space for 132 data bytes. The beginning and ending addresses of the buffer storage area are specified to IOBUFSET in age A- and B-registers, respectively.

Have you upgraded your Linux kernel? (15th Mar 2023)

Preface: Blue screen of death (BSOD) is error display on Windows commonly. In Linux, it is unlikely and uncommon, but is it possible?

Background: As the only copyright holder to the GPL-covered components of the software, you are free to add exceptions and additional terms to the GPLv3, as described in section 7 of that license. In fact, the LGPLv3 is just such a GPLv3 section 7 additional permission, allowing the component to be linked to proprietary code. But it is not recommended. Because it is extreme tricky.

The kernel marks itself as “tainted” when some event occurs that may be relevant when investigating the problem. Found that Kernel 6.1.16 was apparently subject to “oops”. What is “oops”? See below:
The tainted status is printed when a kernel internal problem (“kernel bug”), recoverable error (“kernel oops”), or unrecoverable error (“kernel panic”) occurs, and debug information about this is written to the log dmesg output. The tainted status can also be checked at runtime via files in /proc/.

Solution: Maybe it has nothing to do with serious cyberattacks. But it is recommended to upgrade the kernel . 6.2.5 and 6.1.18 has been updated

ndctl: release v76.1, have you update yet? (14th Mar 2023)

Preface: Preface: Advantages of NVDIMMs in servers. NVDIMMs provide high-speed DRAM performance coupled with flash-backed persistent storage. Aside from providing an additional memory tier in servers, NVDIMM persistence allows applications to continue processing I/O traffic during planned or unexpected system failures.

Background: Persistent Memory (PM) is a type of Non-Volatile Memory (NVM). The ndctl utility is used to manage the libnvdimm (non-volatile memory device) sub-system in the Linux Kernel. It is required for several Persistent Memory Developer Kit (PMDK) features if compiling from source. If ndctl is not available, the PMDK may not build all components and features.
Utility library for managing the libnvdimm (non-volatile memory device) sub-system in the Linux kernel
If you going to Writing Applications for Persistent Memory. Below details is the Programming Model Modes:

Block and File modes use IO

  • Data is read or written using RAM buffers
  • Software controls how to wait (context switch or poll)
  • Status is explicitly checked by software

Volume and PM modes enable Load/Store

  • Data is loaded into or stored from processor registers
  • Processor makes software wait for data during instruction
  • No status checking – errors generate exceptions

Recommendation: Suggest upgrade to ndctl: release v76.1
Version 76.1 Fixed the following:
cxl/event-trace: use the wrapped util_json_new_u64()
cxl/monitor: fix include paths for tracefs and traceevent
cxl/monitor: Make libtracefs dependency optional

hiccup, web server load balancing solution  3rd May 2022

Preface: Online banking cannot lack of load balancing solution today. However in terms of life cycle of operation system and software libaries , Java language development platform and on-demand custom fuctions. Does it bother the load balancing functions? The most challenging parts is the layer 7 load balancing. Perhaps you can do the healt check on appliation functions. However, it is difficult to garantee the non stop function on application side (availability).

Background: The load-balancing algorithms supported are round-robin, weighted least request, random, ring-hash and more. An additional function includes client non interrupt services using application & service availability (health-checks performance).

My focus: Online banking platform (Hong Kong)
Error 500: java.lang.RuntimeException: no EntityContext found in existing session
Date: Around 8:15 am 5/3/2022

Fundamentally, Web server load balancing function in correct way make no downtime. Therefore when you connected to web server had problem. The load balancing function will keep persistence (SSL Sticky) then redirect your connection to the web server which is available.
My experience operating in online banking system in today morning (3rd May, 2022) hints the technical information to me.
I encountered error my web services. (Reminded – it successful logged on and doing operations). However an error 500 display on my screen. Thereafter. even I close the browser, make new established connection to Banking system. It still redirect my new connection to e-banking1.hangseng.com. But in round robin setup architecture, I can connect to e-banking2.hangseng.com by chance.

Observation: Perhaps, load balancer capable web application health check function. But for online banking system, it do a health check on web server front page. On java server page. For example: The EntityContext interface contains the getEjbObject and getPrimaryKey methods that a bean can use to find out about the object it is associated with. The client communicates with the bean via the EJBObject. If one of the java service had error occur. May be the load balancer health check function not know what’s happening.

Whether there is concerns on vulnerable Java SE Embedded versions. So,  apply tight protection and causes this technical problem occurs. Or there is an software configuration problem in web application itself?

Does SpaceX use C language? 23rd Sep, 2021

Preface: SpaceX was founded in 2002 by Elon Musk with the goal of reducing space transportation costs to enable the colonization of Mars.

Background: Exploring Mars helps scientists understand major changes in climate that can fundamentally change the planet. It also allows us to look for biological features that might reveal whether there was abundant life on Mars in the past?

SpaceX engineers shared the programming languages they code in are: “C & C++ for flight software, HTML, JavaScript & CSS for displays
and python for testing,” adding that they “use HTML, JavaScript & CSS. We use Web Components heavily.”

Common programming weaknesses: Many memory manipulation functions in C and C++ do not perform bounds checking and can easily overwrite the allocated bounds of the buffers they operate upon.

  • Mistaken assumptions about the size
  • By design, forming a piece of data is the root cause of most buffer overflows.

Ref: In a classic buffer overflow exploit, the attacker sends data to a program, which it stores in an undersized stack buffer. The result is that information on the call stack is overwritten, including the function’s return pointer. The data sets the value of the return pointer so that when the function returns, it transfers control to malicious code contained in the attacker’s data.

Reality factor: There are many additional programming functions make the situation complex that a programmer cannot accurately predict its behavior.

My view point: Human beings want to explore the universe to meet their needs, and find a way to develop our living space free from the limitations of the earth. In fact, the speed of the rocket is the limit. If you think about it, it will take nine months to reach Mars. But we know that Mars is not suitable for human habitation. Why don’t we take time to improve the air pollution on the planet. In addition, if we can adjust the global greenhouse effect. Therefore, our new life is coming.

Who makes supercomputers faster and faster (CPU, fibre interconnect, parallel processing or virtual machine)? 29th June, 2021.

Preface: In Japanese mythology, the Namazu (鯰) or Ōnamazu (大鯰) is a giant underground catfish who causes earthquakes. This giant not caused disaster, he is the fastest supercomputer in the world. His name is FUGAKU.

Background: Riken and Fujitsu started developing the system in 2014, working closely with ARM to design the A64FX processor. Each of these ships has 48 CPU cores based on the ARM architecture version 8.2A, making it the first such chip in the world. Furthermore, more than 94.2% of supercomputers are based on Linux. In addition, supercomputers can run Windows operating systems.

Do you think today’s supercomputers only rely on a few sets of multi-core processors and standalone operating systems?

When using two virtual machines, VMware found that the overall benchmark results using an 8 TB data set were almost as fast as native hardware, while when using 4 virtual machines, the virtualization method was actually 2% faster. If the system architecture is constructed by many virtual machines. In order to achieve parallel computing to improve efficiency. The supercomputer also apply similar concept.

Base on design goals. HPC workload manager focuses on running distributed memory jobs and supporting high throughput scenarios, and Kubernetes is mainly used to orchestrate containerized microservice applications. If the system architecture is constructed by many virtual machines. Realize parallel computing to improve efficiency. So when the above concepts are implemented on a supercomputer, the processing power will be improved.

The fastest supercomputer this month is FUGAKU. But who can guarantee that FUGAKU will always be number one?