Node256 in Action: Building an Order Book Test Ground

Introduction
Inside Node256
The Order Book as a Test Ground
Pseudocode for newOrder
Problems and Solutions in Order Book Design
Benchmark
Conclusion

Introduction

After exploring Node4, Node16, and Node48 in our Adaptive Radix Tree (ART) journey, the final step in the node family is Node256. This node type uses a fully expanded direct index for maximum fan-out, eliminating any intermediate lookups at this level.

To verify the performance impact of our complete ART implementation, we need a realistic, high-throughput workload that stresses all critical operations: inserts, removals, range traversals, and best-price lookups. Rather than micro-benchmarking the tree in isolation, we’ll embed it inside a practical, domain-relevant structure: a simple order book.

The order book will act as our test ground — a controlled, repeatable environment where the ART must handle real-world access patterns, from matching orders to maintaining best bid/ask, in microseconds.

Inside Node256

Node256 is promoted from Node48 once child count exceeds a threshold. It uses a fixed array of 256 slots indexed directly by the routing byte.

Design Choice	Problem Solved	Trade-off
256-slot fixed array	Removes need to search among children; direct addressing in O(1)	Higher memory footprint
Direct addressing	Predictable lookup and cache-friendly iteration	Wasted slots when sparse
No index indirection	Eliminates extra branching or array hop	Larger node size in memory

The choice here is a deliberate memory-for-speed trade-off: hot paths benefit from predictable, branchless lookups and contiguous scans.

The Order Book as a Test Ground

The order book we implement here isn’t a full production exchange matching engine. It’s a stripped-down design focused on:

Immediate matching of compatible orders.
Efficient insertion of remaining quantities.
Fast removal when orders are fully filled or canceled.
Instant access to best bid and best ask.

This makes it ideal for performance testing the underlying tree because it exercises:

Ascending traversal (find next higher price).
Descending traversal (find next lower price).
Random insert/remove patterns.
Hotspot access to best price levels.

Pseudocode for `newOrder`

function newOrder(command):
    filled = tryInstantMatch(command)
    if filled == command.size:
        return  // fully matched

    if orderId already exists:
        return  // duplicate order

    order = createOrder(command, filled)
    bucket = getOrCreateBucket(order.price, order.side)
    bucket.add(order)
    orderIndex[order.id] = order
    updateBest(order)

function tryInstantMatch(command):
    remaining = command.size
    matched = 0
    orders = (command.side == ASK) ? descend(bidTree) : ascend(askTree)

    for order in orders:
        if !canMatch(command, order):
            break

        tradeSize = min(remaining, order.remainingSize)
        order.fill(tradeSize)
        remaining -= tradeSize
        matched += tradeSize

        if order.isFilled():
            removeOrder(order)

    return matched

Problems and Solutions in Order Book Design

#	Problem	Solution
1	Maintaining Price Order Efficiently	Price-keyed ordered map with O(log n) seek and O(1) per-step `ascend/descend`.
2	Fast Removal When Orders Are Filled	`orderIndex` hash map for O(1) lookup; orders store `prev/next` and bucket pointer for O(1) unlink.
3	Tracking Best Bid and Best Ask	Cache pointers; on empty best level, advance to next in O(log n) seek + O(1) step.
4	Grouping Orders at the Same Price	Bucket per price with doubly linked list; track `head/tail`, `numOrders`, `totalVolume`.
5	Removing Empty Buckets	On last order removal, unlink from bucket O(1), delete price level (tree delete O(log n)).
6	Traversal for Matching	Data structure must support efficient branching + subset iteration with O(log n) seek, O(1) per-step, zero allocations.

Benchmark

This report presents definitive performance analysis comparing ART (Adaptive Radix Tree) and TreeSet implementations for OrderBook operations, based on rigorous JMH benchmarking with proper statistical methodology. The analysis covers multiple scenarios and dataset sizes (10K, 100K orders) with comprehensive statistical validation.

Key Findings:

ART dominates insert-heavy scenarios with 180-227% performance advantages
ART excels in matching scenarios with 49-52% performance advantages
Performance advantages scale consistently across dataset sizes
Statistical significance confirmed with non-overlapping confidence intervals

Methodology

JMH Configuration (Gold Standard)

JVM: OpenJDK 64-Bit Server VM, 11.0.23+9-LTS
JMH Version: 1.37
Forks: 5 (statistical validity across JVMs)
Warmup: 5 iterations × 2 seconds (C1→C2 compilation settling)
Measurement: 10 iterations × 2 seconds (stable results)
JVM Args: -Xmx8G -Xms8G -XX:+UseG1GC -XX:+AlwaysPreTouch
Blackhole Mode: Full + dont-inline hint
Threads: 1 (matches order book threading model)
Dataset Sizes: 10K and 100K orders

Statistical Controls Implemented

Multiple JVM instances: 5 forks eliminate profile bias
Fixed heap allocation: 8GB prevents GC variability
Memory pre-touch: AlwaysPreTouch eliminates page fault jitter
Deterministic data generation: Fixed RNG seed (42) for reproducibility
DCE prevention: Return values + Blackhole consumption confirmed
Environmental controls: macOS App Nap disabled, stable thermals

Data Quality Indicators

Confidence intervals: 99.9% CI reported for all measurements
Sample sizes: 50 measurements per configuration (5 forks × 10 iterations)
Error margins: All measurements include ±confidence intervals
Coefficient of variation: Low variance indicates stable measurements

Throughput Results

Scenario 1: Pure Insert

Dataset: 10K Orders

Impl	Ops/s	Error ±	CI 99.9%	Rel Perf
ART	1,364.011	112.974	[1,251.037, 1,476.985]	Baseline
TreeSet	486.809	12.852	[473.957, 499.661]	-64.3%
Gap: +180.2% ART advantage

Dataset: 100K Orders

Impl	Ops/s	Error ±	CI 99.9%	Rel Perf
ART	155.118	3.543	[151.575, 158.661]	Baseline
TreeSet	47.445	1.598	[45.847, 49.043]	-69.4%
Gap: +227.0% ART advantage

Scenario 2: Pure Match

Dataset: 10K Orders

Impl	Ops/s	Error ±	CI 99.9%	Rel Perf
ART	2,040.202	33.832	[2,006.369, 2,074.034]	Baseline
TreeSet	1,365.038	54.959	[1,310.078, 1,419.997]	-33.1%
Gap: +49.5% ART advantage

Dataset: 100K Orders

Impl	Ops/s	Error ±	CI 99.9%	Rel Perf
ART	223.693	4.706	[218.987, 228.399]	Baseline
TreeSet	147.361	2.201	[145.160, 149.561]	-34.1%
Gap: +51.8% ART advantage

Tail Latency Analysis

Scenario	Impl	Mean (µs)	p99 (µs)	p99.9 (µs)	Insight
Random Mix	ART	7.13	19.6	85.6	Better tail latency
	TreeSet	9.79	39.9	438.7	Slower, higher jitter
Partial Match	ART	1.15	4.28	63.3	Slower in this case
	TreeSet	0.144	0.375	6.79	Dominates in low-latency
Hotspot	ART	6.04	22.7	207.7	Comparable
	TreeSet	6.27	27.0	244.9	Slight edge in high percentiles

GC Profiling (Random Mix)

Impl	Alloc (B/op)	gc.count	gc.time (ms)	Insight
ART	8.47MB/op	2	3	~63% less allocation
TreeSet	13.81MB/op	1	2	Higher allocation per op

Conclusion

ART offers clear throughput and tail-latency wins in most scenarios, with strong statistical validity. TreeSet retains niche advantage in ultra-low-latency partial matches. Selection should be workload-driven.

You can browse the full implementation of Node256, OrderBook and benchmark code as well as final comprehensive report results with gc logs in this Github repo:

View Source on GitHub

Node256 in Action: Building an Order Book Test Ground

Table of Contents

Introduction

Inside Node256

The Order Book as a Test Ground

Pseudocode for `newOrder`

Problems and Solutions in Order Book Design

Benchmark

Methodology

Throughput Results

Scenario 1: Pure Insert

Scenario 2: Pure Match

Tail Latency Analysis

GC Profiling (Random Mix)

Conclusion

Keep Reading

Sub-Microseconds

Home

Node256 in Action: Building an Order Book Test Ground

Table of Contents

Introduction

Inside Node256

The Order Book as a Test Ground

Pseudocode for newOrder

Problems and Solutions in Order Book Design

Benchmark

Methodology

Throughput Results

Scenario 1: Pure Insert

Scenario 2: Pure Match

Tail Latency Analysis

GC Profiling (Random Mix)

Conclusion

Keep Reading

Sub-Microseconds

Home

Pseudocode for `newOrder`