Preface: The new version of Linux kernel 5.1 will add this io_uring. The main purpose of io_uring is to improve the original Linux native AIO problem. For example:
– MySQL and Nginx already support local AIO.
– InnoDB uses the asynchronous I/O subsystem (native AIO) on Linux to perform read-ahead and write requests for data file pages.
Technical details: To put it simply, AIO hands over the corresponding callback function to the system, which is truly asynchronous. However Linux native AIO imposes the following restrictions on files opened with the O_DIRECT flag. When reading and writing files in AIO mode, the operating system’s cache of files cannot be used. The address, content size, and file offset of the buffer can only be read and written from the disk (usually 512 bytes). The advantage to use O_DIRECT will avoid making extra copies of data while transferring it and the call will return after transfer is complete.
Vulnerability details:
Files access across suid boundaries – io_uring takes a non-refcounted reference to the files_struct of the process that submitted a request (relying on ->flush() for being notified before the files_struct can go away). Unfortunately, unshare_fd(), which is used by bprm_execve() via unshare_files(), doesn’t know about that, and assumes that if the files_struct’s refcount is 1, it is okay to keep using the old files_struct.
mm access across suid boundaries – If attacker let the suid binary write the fd number to a fixed address and then use that address instead of free_fd. It can trigger the vulnerability.
Reference: mm (pointer to struct mm_struct) refers to a address space of a process.
For example, exe_file (pointer to struct file) refers to executable file,
while arg_start and arg_end are addresses of first and last byte of argv passed to a process respectively
Status: This vulnerability is currently awaiting analysis.