
Preface: The Rowhammer effect, a hardware vulnerability in DRAM chips, was first publicly presented and analyzed in June 2014 at the International Symposium on Computer Architecture (ISCA). This research, conducted by Yoongu Kim et al., demonstrated that repeatedly accessing a specific row in a DRAM chip can cause bit flips in nearby rows, potentially leading to security breaches.
Background: Nvidia has shifted from “copy on flip” to asynchronous copy mechanisms in their GPU architecture, particularly with the Ampere architecture and later. This change allows for more efficient handling of data transfers between memory and the GPU, reducing latency and improving overall performance, especially in scenarios with high frame rates or complex computations.
When System-Level ECC is enabled, it prevents attackers from successfully executing Rowhammer attacks by ensuring memory integrity. The memory controller detects and corrects bit flips, making it nearly impossible for an attacker to exploit them for privilege escalation or data corruption.
Technical details: Modern DRAMs, including the ones used by NVIDIA, are potentially susceptible to Rowhammer. The now decade-old Rowhammer problem has been well known for CPU memories (e.g., DDR, LPDDR). Recently, researchers at the University of Toronto demonstrated a successful Rowhammer exploitation on a NVIDIA A6000 GPU with GDDR6 memory where System-Level ECC was not enabled. In the same paper, the researchers showed that enabling System-Level ECC mitigates the Rowhammer problem.
Official announcement: Technical details: see link – https://nvidia.custhelp.com/app/answers/detail/a_id/5671