Preface: The AMD Radeon Instinct™ MI50 server accelerator designed on the world’s First 7nm FinFET technology process brings customers a full-feature set based on the industry’s newest technologies. The MI50 is AMD’s workhorse accelerator offering that is ideal for large scale deep learning. Delivering up to 26.5 TFLOPS of native half-precision (FP16) or up to 13.3 TFLOPS single-precision (FP32) peak floating point performance and INT8 support and combined with 16GB or 32GB of high-bandwidth HBM2 ECC memory, the AMD Radeon Instinct™ MI50 brings customers finely balanced performance needed for enterprise-class, mid-range compute capable of training complex neural networks for a variety of demanding deep learning applications in a cost effective design.
Background: The drm/amdgpu driver supports all AMD Radeon GPUs based on the Graphics Core Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures.
CDNA (Compute DNA) is a compute-centered graphics processing unit (GPU) microarchitecture designed by AMD for datacenters.
AMD CDNA architecture is supported by AMD ROCm™, an open software stack that includes a broad set of programming models, tools, compilers, libraries, and runtimes for AI and HPC solution development targeting AMD Instinct accelerators.
Vulnerability details: In the Linux kernel, the following vulnerability has been resolved: drm/amdgpu: Forward soft recovery errors to userspace As we discussed before[1], soft recovery should be forwarded to userspace, or we can get into a really bad state where apps will keep submitting hanging command buffers cascading us to a hard reset. 1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/ (cherry picked from commit 434967aadbbbe3ad9103cc29e9a327de20fdba01)
Official announcement: Please refer to the website for details – https://nvd.nist.gov/vuln/detail/CVE-2024-44961