Preface: In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others. In garbage collection algorithms, reference counts may be used to deallocate objects that are no longer needed.
Background: The drm/amdgpu driver supports all AMD Radeon GPUs based on the Graphics Core Next (GCN) architecture.
AI and Machine Learning Development on a Local Desktop with AMD Radeon™ Graphics Cards
AMD now supports RDNA™ 3 architecture-based GPUs for desktop based AI and ML workflows using AMD ROCm™ software. Developers can work with ROCm 6.1 software for Radeon on Linux® systems using PyTorch®, TensorFlow and ONNX Runtime. Added support for WSL 2 (Windows® Subsystem for Linux) now also enables users to develop with AMD ROCm™ software on a Windows® system, eliminating the need for dual boot set ups.
Vulnerability details: The CVE does not describe the vulnerability enumeration. Additionally, AMD only provides patch change details. Perhaps the design weakness in CVE-2024-41008 is related to garbage collection.
This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is reference counted.
- introducing two new helper funcs for task_info lifecycle management
- amdgpu_vm_get_task_info: reference counts up task_info before returning this info
- amdgpu_vm_put_task_info: reference counts down task_info
- – last put to task_info() frees task_info from the vm.
Official announcement: Please refer to the vendor announcement for details – https://nvd.nist.gov/vuln/detail/CVE-2024-41008