Preface: When a Pod’s spec[.]schedulerName is set to kai-scheduler, the default scheduler will completely ignore the Pod, and only the KAI Scheduler’s ServiceAccount permissions will intervene to handle the scheduling and resource binding of the task.
Background: The vulnerability stems from how the scheduler tracks and authorizes GPU resources across different namespaces when using the Reservation Pod mechanism .
1.Improper Isolation: In a multi-tenant environment, the scheduler is supposed to enforce strict isolation between namespaces .
2.Cross-Namespace References: The flaw allows an attacker with access to one namespace to craft a pod definition (such as by manipulating annotations or references) that points to or impacts a PodGroup or reservation in a different, unauthorized namespace .
3.Data/Resource Tampering: By setting nvidia[.]com/gpu (an integer resource) while KAI is managing the same physical card via fractional annotations (like gpu-fraction: “0.5”), you could cause the scheduler to miscalculate or “lose track” of the actual hardware state .
Exploitation: An attacker could use these cross-namespace references to “poach” GPUs reserved for other tenants, leading to data tampering (modifying scheduling state) or unauthorized resource access .
Vulnerability details: NVIDIA KAI Scheduler contains a vulnerability where an attacker could cause improper authorization through cross-namespace pod references. A successful exploit of this vulnerability might lead to data tampering.
Remedy: To mitigate this risk, you should immediately update to NVIDIA KAI Scheduler v0.13.0 or later .
Official announcement: Please refer to link for details – https://nvidia.custhelp.com/app/answers/detail/a_id/5818