Preface: NVIDIA, the inventor of the Graphics Processing Unit (GPU) brings visual computing excellence to the embedded world. High performance meets low power with the NVIDIA Tegra processor – get ready for HD video, crisp graphics and unprecedented 3D capabilities, all in one power efficient package.
Background: GPUDirect Storage kernel driver nvidia-fs.ko is a kernel module to orchestrate IO directly from DMA/RDMA capable storage to user allocated GPU memory on NVIDIA Graphics cards. NVIDIA GPU using DMAdirect. There are DMA engines in GPUs and storage-related devices like NVMe drivers and storage controllers but generally not in CPUs. Because of this external extended resources allocation implemented in Nvidia GPU design. So when you open the resource files package (gds-nvidia-fs). You will find two types of RDMA files. The nvfs-rdma[.]c files are source files which will be compiled. The nvfs-rdma[.]h files are used to expose the API of a program to either other part of
that program or other program is you are creating a library.
Remark: Usually, GPUDirect kernel module is set to load by default by the system startup service. If it is not loaded, GPUDirect RDMA would not work, which would result in a very high latency for message communications.
The high-risk scoring items caught my attention (see below):
CVE‑2021‑23201 – NVIDIA GPU and Tegra hardware contain a vulnerability in an internal microcontroller which may allow a user with elevated privileges to generate valid microcode. This could lead to information disclosure, data corruption, or denial of service of the device.
CVE‑2021‑23217 – NVIDIA GPU and Tegra hardware contain a vulnerability in the internal microcontroller which may allow a user with
elevated privileges to instantiate a specifically timed DMA write to corrupt code execution, which may impact confidentiality, integrity,
or availability.
As usual, vendor not convenient to elaborate the vulnerabilities reason in details. However if you are interested of this design weakness.
You can find the hints to narrow down the item then do a summary. Even if it may not be accurate. But there is no harm in doing this research.
Be my guest. Refer to diagram, the well known vulnerabilities is given by dirver (nvlddmkm[.]sys). Nvlddmkm[.]sys error is a well-known error. However I believe the vulnerability occurred this time may extend the impact to other edge. For example CPU (please refer to step 5,6 &7 display on attached diagram).
Official details and remedy: Please refer to the link – https://nvidia.custhelp.com/app/answers/detail/a_id/5263