mjasny's comments

mjasny · 2026-01-07T09:30:20 1767778220

Yes, for file systems these statements are true.

However, in our experiments (including Figure 9), we bypass the page cache and issue writes using O_DIRECT to the block device. In this configuration, write completion reflects device-level persistence. For consumer SSDs without PLP, completions do not imply durability and a flush is still required.

> "When an SSD has Power-Loss Protection (PLP) -- for example, a supercapacitor that backs the write-back cache -- then the device's internal write-back cache contents are guaranteed durable even if power is lost. Because of this guarantee, the storage controller does not need to flush the cache to media in a strict ordering or slow way just to make data persistent." (Won et al., FAST 2018) https://www.usenix.org/system/files/conference/fast18/fast18...

We will make this more explicit in the next revision. Thanks.

mjasny · 2026-01-06T21:58:22 1767736702

Thanks a lot for the kind words, we really appreciate it!

Regarding your questions:

1) Yes. If you want to take advantage of IOPOLL while still handling network I/O, you typically need two rings per thread: an IOPOLL-enabled ring for storage and a regular ring for sockets and other non-polled I/O.

2) They are not mutually exclusive. SQPOLL was enabled in addition to IOPOLL in the experiments (+SQPoll). SQPOLL affects submission, while IOPOLL changes how completions are retrieved.

3) The main trade-off is CPU usage vs. latency. SQPOLL spawns an additional kernel thread that busy spins to issue I/O requests from the ring. With IOPOLL interrupts are not used and instead the device queues are polled (this does not necessarily result in 100% CPU usage on the core).

4) Yes. For a modern DBMS, a thread-per-core model is the natural fit. Rings should not be shared between threads; each thread should have its own io_uring instance(s) to avoid synchronization and for locality.