GPU RENDERING

Lock-Free SPSC Buffer

One reader thread, one parser thread, one lock-free ring buffer. Zero contention.

Questions this answers

  • How to build a lock-free buffer for terminal PTY data
  • Terminal performance bottleneck between PTY reader and parser
  • Single producer single consumer ring buffer for real-time applications
  • Reducing thread contention in terminal emulator architecture

How it works

The PTY reader thread and the ANSI parser thread communicate through a lock-free single-producer single-consumer (SPSC) ring buffer. The reader writes raw bytes from the PTY file descriptor into the ring buffer. The parser reads bytes out of the ring buffer and processes escape sequences. No mutex, spinlock, or kernel synchronization primitive is used at any point in this path.

The ring buffer uses two atomic cursors: a write cursor advanced by the reader and a read cursor advanced by the parser. Each thread only writes its own cursor and reads the other, so there is no contention. The buffer is sized to a power of two (typically 256 KB), allowing index wrapping with a simple bitwise AND instead of a modulo operation.

Cache-line padding separates the two cursors to prevent false sharing on ARM and x86 architectures. The reader and parser operate on independent cache lines, so neither thread's writes invalidate the other's cached cursor value. This eliminates the last source of cross-core overhead in the data pipeline.

Why it matters

The PTY-to-parser pipeline is the hottest data path in a terminal emulator. Every byte of program output flows through it. A mutex on this path would create contention between the PTY reader and the parser, adding latency on every I/O cycle. Chau7 uses a single-producer single-consumer ring buffer with no locks, no kernel transitions, and no contention. The reader and parser proceed independently.

Frequently asked questions

What happens if the ring buffer fills up?

If the parser falls behind the reader, the reader detects a full buffer via the cursor positions and applies backpressure by pausing the PTY read. This is the correct behavior: it signals the child process to slow down via standard POSIX flow control, preventing unbounded memory growth.

Why SPSC instead of a multi-producer or multi-consumer buffer?

The PTY data path has exactly one producer (the reader thread) and one consumer (the parser thread). An SPSC buffer exploits this fixed topology to eliminate all atomic read-modify-write operations, using only atomic loads and stores. Multi-producer or multi-consumer queues require compare-and-swap loops that are strictly more expensive.

How large is the ring buffer?

The default size is 256 KB, which provides roughly 3ms of buffering at 80 MB/s throughput. This is large enough to absorb scheduling jitter between the reader and parser threads while small enough to remain resident in L2 cache on Apple Silicon.