Technical report shared by ETH Zurich on 4th April 2024: Please refer to the link for details – https://arxiv.org/html/2404.03387v1
Background: Intel CPU’s inbuilt TDX Module provides an interface for the hypervisor to manage VMs indirectly and offers new instructions, VMLAUNCH-VMX and VMRESUME, for starting and resuming a VM. Instead of keeping track of the owner for each memory page on a system wide basis, TDX relies on keeping one shared extended page table for the hypervisor and multiple private extended page tables for each virtual machine (called TD by Intel).
Reference: int
means interrupt, and the number 0x80
is the interrupt number. An interrupt transfers the program flow to whomever is handling that interrupt, which is interrupt 0x80
in this case. In Linux, 0x80
interrupt handler is the kernel, and is used to make system calls to the kernel by other programs.
The kernel is notified about which system call the program wants to make, by examining the value in the register %eax
(AT&T syntax, and EAX in Intel syntax). Each system call have different requirements about the use of the other registers. For example, a value of 1
in %eax
means a system call of exit()
, and the value in %ebx
holds the value of the status code for exit()
.
Vulnerability details: The test implement a kernel module in 150 LoC to inject interrupts into the TDX VM. Our host module uses kernel hooks to call a function in KVM that is used to deliver int 0x80 interrupts to TDX VMs. Unlike SEV-SNP, TDX does not expose the Virtual Machine Control Structure (VMCS) or the virtual APIC pages to the untrusted hypervisor. Instead, it expects the hypervisor to write into a Posted Interrupt Request (PIR) buffer. This buffer is used by hardware to inject interrupts into TDX VMs through the virtual APIC [34]. We inject two interrupts into two different cores of the CVM with this mechanism, one to gain login into the TDX VM with OpenSSH and another to get root access with sudo. During these two injects, the guest kernel does not acknowledge the interrupts. While this does not stop our attacks, it does leave the APIC with an elevated Task-Priority-Register (TPR), blocking all lower-priority interrupts on the affected vCPU. This may break CVM functionality that is noticeable by the user. To evade such detection, we implement a guest kernel module (kern_ack) that resets the APIC state. We inject this kernel module into the TDX VM as the last part of our attack after gaining root access.