LLQ: A wait-free SPSC linked-list queue with recyclable nodes

2022-12-04

Last year, I published a Rust library called basedrop, which implements a memory reclamation system tailored to the constraints of real-time audio scenarios. The purpose of basedrop is to make it easy to share dynamically allocated memory with a real-time audio thread while ensuring that no allocations or deallocations happen on that thread. This is accomplished by providing a set of smart pointers (analogous to Box and Arc from the Rust standard library) which do not directly free their associated allocation when dropped, but instead automatically push it onto a lock-free queue to be collected later on another thread.

Basedrop's design has some compelling benefits: it frees you from having to write code by hand every time you want to transfer an object to another thread to be freed, and if you restrict yourself to its vocabulary of smart pointers, it eliminates the possibility of accidentally dropping an allocation on the real-time thread (a mistake which can easily remain invisible if you don't have something like assert_no_alloc to catch it). However, after talking with some developers trying to make use of basedrop in real projects, it became clear to me that these benefits come at the cost of a somewhat opinionated API, making it difficult to integrate with certain program architectures. I decided that a stripped-down version of the core linked-list queue would probably have some value, and the end result of that was the llq crate.

A central piece of basedrop's design is the Node<T> type, which represents a node that can potentially be added to the collector queue's linked list. Each of basedrop's smart pointers allocates a Node<T> on the heap at creation time, so when the time comes to mark the contained object as ready for deallocation, that node already exists, stored inline as part of the original allocation, and can simply be linked into the queue. This makes it possible to send an object back from the real-time thread to be reclaimed without performing any allocator operations.

The llq crate extracts just that core functionality, of a wait-free linked-list queue with preallocated nodes, and presents it in an unopinionated way. With llq, you can create some nodes:

use llq::{Node, Queue};

let x = Node::new(0);
let y = Node::new(1);
let z = Node::new(2);

push them onto a queue:

let (mut tx, mut rx) = Queue::<usize>::new().split();

tx.push(x);
tx.push(y);
tx.push(z);

pull them off the other end:

let x = rx.pop().unwrap();
let y = rx.pop().unwrap();
let z = rx.pop().unwrap();

and even reuse them with a separate queue:

let (mut tx2, mut rx2) = Queue::<usize>::new().split();

tx2.push(x);
tx2.push(y);
tx2.push(z);

and none of the above push or pop operations will ever allocate or free memory, lock a mutex, or even enter an unbounded compare-exchange loop.

It's worth noting that essentially the only synchronization operations in the entire source of llq are a single acquire load in the body of pop and a release store in the body of push. I consider it a pretty compelling demonstration of Rust's type system and safety guarantees that a concurrent data structure with such minimal synchronization overhead can still have a data race-free public API (assuming llq's implementation is bug-free, of course!).

For reference, the queue design in llq is based on a particular unbounded SPSC queue design from 1024cores. There's also an implementation of a similar queue design in the internals of the Rust standard library.

It's important to note that llq is designed for a specific, uncommon set of requirements, and several aspects of its design are more or less the opposite of what one would want out of a general-purpose channel for communicating between threads. llq is designed for the scenario where

A general-purpose queue should probably

So, if you're just looking for a generic SPSC queue, there's good chance llq is not what you want. But if you're implementing real-time audio software, or are otherwise facing a situation where some threads in your program have much stricter latency requirements than others, llq might be worth a look.

You can check it out over on GitHub or on crates.io.

Discuss on: Twitter · Mastodon · r/rust · r/programming · HN