The whole world is paying attention to Nvidia, but supercomputers using AMD are the super ones! (July 28, 2025)

Preface: The El Capitan system at the Lawrence Livermore National Laboratory, California, USA remains the No. 1 system on the TOP500. The HPE Cray EX255a system was measured with 1.742 Exaflop/s on the HPL benchmark. El Capitan has 11,039,616 cores and is based on AMD 4th generation EPYC™ processors with 24 cores at 1.8 GHz and AMD Instinct™ MI300A accelerators. It uses the HPE Slingshot interconnect for data transfer and achieves an energy efficiency of 58.9 Gigaflops/watt. The system also achieved 17.41 Petaflop/s on the HPCG benchmark which makes it the new leader on this ranking as well. June 2025

Background: Does El Capitan Use Docker or Kubernetes? El Capitan does not use Docker directly, but it does use Kubernetes—specifically:

Kubernetes is deployed on Rabbit and worker nodes. It is part of a stateless orchestration layer integrated with the Tri-Lab Operating System Stack (TOSS).

Kubernetes is used alongside Flux (the resource manager) and Rabbit (the near-node storage system) to manage complex workflows.

Why Kubernetes Instead of Docker Alone?

While Docker is lightweight and flexible, Kubernetes offers orchestration, which is critical for:

  • Managing thousands of concurrent jobs.
  • Coordinating data movement and storage across Rabbit nodes.
  • Supporting AI/ML workflows and in-situ analysis.

But Kubernetes has a larger memory and CPU footprint than Docker alone.

Technical details: HPE Cray Operating System (COS) is a specialized version of SUSE Linux Enterprise Server designed for high-performance computing, rather than being a variant of Red Hat Enterprise Linux. It’s built to run large, complex applications at scale and enhance application efficiency, reliability, management, and data access. While COS leverages SUSE Linux, it incorporates features tailored for supercomputing environments, such as enhanced memory sharing, power monitoring, and advanced kernel debugging.

What Does Cray Modify?
Cray (now part of HPE) primarily modifies:
-The Linux kernel for performance tuning, scalability, and hardware support
-Adds HPC-specific enhancements, such as:
Optimized scheduling
NUMA-aware memory management
High-speed interconnect support (e.g., Slingshot)
Enhanced I/O and storage stack
-Integrates with Cray Shasta architecture and Slingshot interconnect

These modifications are layered on top of SUSE Linux, meaning the base OS remains familiar and enterprise-grade, but is tailored for supercomputing.

End.

Our world is full of challenges and hardships. But you must be happy every day!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.