Descriptorless files for io_uring
The lowly file descriptor is one of the fundamental objects in Linux systems.
Interestingly, though, the io_uring subsystem looks as if it is moving toward its own number space separate from file descriptors.
- The kernel needs to perform some setup for
io_uring
operations on a file descriptor. It takes a reference to the file & “locks down” the memory for the buffer (what does this mean concretely?) - The
io_uring_register
syscall allows userspace to explicitly request that this setup is performed for a list of buffers (opcode:IORING_REGISTER_BUFFERS
) and/or files (opcode:IORING_REGISTER_FILES
) - If
IORING_REGISTER_FILES
is called, all “registered” (or fixed) files must then be referenced by their index in the list of files passed toio_uring_register
, and not their fd. - If you’re using
io_uring
to create a fd in the first place, there’s an unnecessary user-space conversion step:io_uring
creates the file, and puts the fd on the buffer- userspace calls
io_uring_register
register the file and receive a fixed file offset - userspace then enqueues operations on this file using the fixed file offset
- There’s a new patch series out that allows
io_uring
to create and register fds, and return their fixed file offsets instead of their fd numbers.
The most likely use case for this feature is network servers; a busy server can create (with accept()) and use huge numbers of file descriptors in a short period of time. While io_uring operations, being asynchronous, can generally be executed in any order, it is possible to chain operations so that one does not begin before the previous one has successfully completed. Using this capability, a network server could queue a series of operations to accept the next incoming connection (storing it in the fixed-file table), write out the standard greeting, and initiate a read for the first data from the remote peer. User space would only need to become involved once that data has arrived and is ready to be processed.