A complete trading system simulation consuming UDP market data, processing orders via a Lock-Free Ring Buffer, and executing trades with deterministic latency.
The application uses a Thread-Per-Core architecture with Kernel Isolation to eliminate OS jitter and context switching:
-
Network Thread (Producer - Core 4):
- Uses
recvmmsgfor batch packet reception (reducing syscall overhead). - Parses raw binary packets (Wire Format).
- Writes to a Lock-Free Ring Buffer using a Zero-Copy Claim/Publish pattern.
- Uses
-
Engine Thread (Consumer - Core 5):
- Isolated Core: Running on a dedicated CPU core (
isolcpus=5) withnohz_fullandrcu_nocbsto prevent scheduler ticks and interrupts. - Polls the Ring Buffer.
- Updates the Limit Order Book (LOB).
- Logs execution stats (p50, p99, Max).
- Isolated Core: Running on a dedicated CPU core (
End-to-end latency (Wire-to-Trade) measured on AMD Ryzen 5 5600X (Rocky Linux 9.7).
| Configuration | Median (p50) | Tail Latency (p99) | Worst Case (Max) | Verdict |
|---|---|---|---|---|
| 1. Standard (STL) | 70 ns | ~300 ns | 792,000 ns | Unusable for HFT |
| 2. Optimized Software | 30 ns | ~100 ns | 2,200 ns | Fast, but noisy |
| 3. Thread Pinning | 30 ns | ~100 ns | 418,000 ns | Pinning != Isolation |
| 4. Full Kernel Isolation | 20 ns | 60 ns | 670 ns | Production Ready |
Key Findings:
- Median: Software optimization (Object Pool/Vector) drove the median down to 20ns.
- Tail Latency (p99): Kernel Isolation stabilized the p99 at 60ns, meaning 99% of orders are processed in under 220 CPU cycles.
- Jitter (Max): Only full isolation eliminated the scheduler spikes, reducing the worst-case scenario from >400µs to 0.6µs.
To reproduce the "Production Ready" results, the Linux kernel must be booted with isolation parameters to silence the OS on specific cores:
# Example for isolating cores 4 and 5
grubby --update-kernel=ALL --args="isolcpus=4,5 nohz_full=4,5 rcu_nocbs=4,5"isolcpus: Removes cores from the general SMP balancing and scheduler algorithms.nohz_full: Stops the scheduling-clock tick on the isolated cores (adaptive ticks).rcu_nocbs: Offloads RCU callbacks to other CPUs.
- LOB Core: Automatically fetched via CMake from limit-order-book.
- Google Test: For unit testing.
mkdir build && cd build
cmake ..
make -j$(nproc)- Start the Engine:
./feed_handler - Start the Market Simulator (in another terminal):
python3 market_sim.pyThe system accepts binary messages (Little Endian, Packed):
-
Add Order ('A'):
[Type:1][ID:8][Price:4][Qty:4][Side:1] -
Cancel Order ('C'):
[Type:1][ID:8]