RSoC: improving drivers and kernel - part 4 (largely io_uring)

By 4lDO2 on

Introduction

This week has been mostly about advancing the interface as much as possible, with the goal of being the default for pcid, xhcid, and usbscisd, as I previously mentioned. With the introduction of the AsyncScheme trait, I have now actually been able to operate the pci: scheme socket (well, :pci) completely asynchronously and with io_uring, by making the in-kernel RootScheme async too.

Consumer/producer instances

I began this week by making it clearer in-kernel, which contexts are known as “producers”, and which ones are known as “consumers”. As the names imply, the producers are the processes or usually the kernel, that receives SQEs, processes them, and sends CQEs. Similarly, the producers, also being either a process or the kernel (direct kernel-to-kernel io_uring communication never happens though), sends SQEs and receives CQEs. Having separated these types of rings into separate enum variants, it became much easier reasoning about in-kernel io_uring handles.

impl AsyncScheme for RootScheme

The RootScheme is now the first kernel scheme to implement the poll_handle fn! This means, that io_uring will not block within the kernel if the file descriptors used in system calls belong to that scheme. While the biggest and the most important challenge is making UserScheme async, there is at least one scheme now that has requests that can wake a possible infinite time to complete, that doesn’t block internally. Consequently, pcid was now able to receive an SQE directly from xhcid, albeit now with handling for that.

Almost there, PCI!

The pcid_interface part should already be finished for the most part; it now supports io_uring, with the predecessor, pipes where the fds are shared using environment variables, deprecated (I presume I would also deprecate the old process-arguments interface as well). There is a possibility that a few arguments of some functions may change there, but apart from that, that part needs not that much more work.

However, most of the io_uring code actually happens within redox-iou, and the newly-separated crate redox-buffer-pool, which handle the actual OS interface. The goal someday, is for these to eventually be integrated into mio, and hence, tokio. While I managed to get all types of io_urings, be it userspace-to-userspace, userspace-to-kernel, and kernel-to-userspace working (not tested, and not fully implemented in the kernel yet), there is one limitation currently: memory and buffer management.

redox-buffer-pool

Userspace-to-kernel and kernel-to-userspace is easy; the kernel manages all the memory for us, as it’s man-in-the-middle for all syscalls there. With userspace-to-userspace io_urings however, all the kernel can do is to make io_uring memory management easier, but the processes will still need some method of systematically sharing buffers for syscalls.

Initially included within redox-iou, redox-buffer-pool has become it’s own crate, with the purpose of providing a memory allocator that gives out chunks with an arbitrary size and alignment, from larger chunks of memory that originate from e.g. mmap. It also supports “guards”, which prevent slices within a buffer pool from being reclaimed (dropping may also leak if it has to), until the guard allows that. For io_uring futures, this would mean that the future will have a guard tied to its state-Arc, that is released when the Arc has been dropped (it stores a Weak), or when the future has transitioned to the Finished or Canceled states.

This crate may and will probably also be used for drivers to manage physical memory shared between drivers and hardware, which is conceptually similar, but requires some extra functionality (like making sure a single buffer slice doesn’t overlap the underlying physical memory allocations).

It’s not completely finished yet (in fact the allocator is currently O(n) in worst case, but this is something that I will fix soon).

TODO

Now that I have gotten producers and secondary consumers to work with redox-iou (the “secondary” means that it’s a userspace-to-userspace ring, controlled by a userspace-to-kernel ring), I also need to get the buffer pool working as well, so that the pcid interface can transfer data between drivers, safely and fast.

And, I also unfortunately need to get the new compiler submodule merged, with all the patches required for that.