Table of Contents

Introduction

After exploring Node4, Node16, and Node48 in our Adaptive Radix Tree (ART) journey, the final step in the node family is Node256. This node type uses a fully expanded direct index for maximum fan-out, eliminating any intermediate lookups at this level.

To verify the performance impact of our complete ART implementation, we need a realistic, high-throughput workload that stresses all critical operations: inserts, removals, range traversals, and best-price lookups. Rather than micro-benchmarking the tree in isolation, we’ll embed it inside a practical, domain-relevant structure: a simple order book.

The order book will act as our test ground — a controlled, repeatable environment where the ART must handle real-world access patterns, from matching orders to maintaining best bid/ask, in microseconds.

Inside Node256

Node256 is promoted from Node48 once child count exceeds a threshold. It uses a fixed array of 256 slots indexed directly by the routing byte.

Design Choice

Problem Solved

Trade-off

256-slot fixed array

Removes need to search among children; direct addressing in O(1)

Higher memory footprint

Direct addressing

Predictable lookup and cache-friendly iteration

Wasted slots when sparse

No index indirection

Eliminates extra branching or array hop

Larger node size in memory

The choice here is a deliberate memory-for-speed trade-off: hot paths benefit from predictable, branchless lookups and contiguous scans.

The Order Book as a Test Ground

The order book we implement here isn’t a full production exchange matching engine. It’s a stripped-down design focused on:

  • Immediate matching of compatible orders.

  • Efficient insertion of remaining quantities.

  • Fast removal when orders are fully filled or canceled.

  • Instant access to best bid and best ask.

This makes it ideal for performance testing the underlying tree because it exercises:

  • Ascending traversal (find next higher price).

  • Descending traversal (find next lower price).

  • Random insert/remove patterns.

  • Hotspot access to best price levels.

Pseudocode for newOrder

function newOrder(command):
    filled = tryInstantMatch(command)
    if filled == command.size:
        return  // fully matched

    if orderId already exists:
        return  // duplicate order

    order = createOrder(command, filled)
    bucket = getOrCreateBucket(order.price, order.side)
    bucket.add(order)
    orderIndex[order.id] = order
    updateBest(order)

function tryInstantMatch(command):
    remaining = command.size
    matched = 0
    orders = (command.side == ASK) ? descend(bidTree) : ascend(askTree)

    for order in orders:
        if !canMatch(command, order):
            break

        tradeSize = min(remaining, order.remainingSize)
        order.fill(tradeSize)
        remaining -= tradeSize
        matched += tradeSize

        if order.isFilled():
            removeOrder(order)

    return matched

Problems and Solutions in Order Book Design

#

Problem

Solution

1

Maintaining Price Order Efficiently

Price-keyed ordered map with O(log n) seek and O(1) per-step ascend/descend.

2

Fast Removal When Orders Are Filled

orderIndex hash map for O(1) lookup; orders store prev/next and bucket pointer for O(1) unlink.

3

Tracking Best Bid and Best Ask

Cache pointers; on empty best level, advance to next in O(log n) seek + O(1) step.

4

Grouping Orders at the Same Price

Bucket per price with doubly linked list; track head/tail, numOrders, totalVolume.

5

Removing Empty Buckets

On last order removal, unlink from bucket O(1), delete price level (tree delete O(log n)).

6

Traversal for Matching

Data structure must support efficient branching + subset iteration with O(log n) seek, O(1) per-step, zero allocations.

Benchmark

This report presents definitive performance analysis comparing ART (Adaptive Radix Tree) and TreeSet implementations for OrderBook operations, based on rigorous JMH benchmarking with proper statistical methodology. The analysis covers multiple scenarios and dataset sizes (10K, 100K orders) with comprehensive statistical validation.

Key Findings:

  • ART dominates insert-heavy scenarios with 180-227% performance advantages

  • ART excels in matching scenarios with 49-52% performance advantages

  • Performance advantages scale consistently across dataset sizes

  • Statistical significance confirmed with non-overlapping confidence intervals

Methodology

JMH Configuration (Gold Standard)

  • JVM: OpenJDK 64-Bit Server VM, 11.0.23+9-LTS

  • JMH Version: 1.37

  • Forks: 5 (statistical validity across JVMs)

  • Warmup: 5 iterations × 2 seconds (C1→C2 compilation settling)

  • Measurement: 10 iterations × 2 seconds (stable results)

  • JVM Args: -Xmx8G -Xms8G -XX:+UseG1GC -XX:+AlwaysPreTouch

  • Blackhole Mode: Full + dont-inline hint

  • Threads: 1 (matches order book threading model)

  • Dataset Sizes: 10K and 100K orders

Statistical Controls Implemented

  • Multiple JVM instances: 5 forks eliminate profile bias

  • Fixed heap allocation: 8GB prevents GC variability

  • Memory pre-touch: AlwaysPreTouch eliminates page fault jitter

  • Deterministic data generation: Fixed RNG seed (42) for reproducibility

  • DCE prevention: Return values + Blackhole consumption confirmed

  • Environmental controls: macOS App Nap disabled, stable thermals

Data Quality Indicators

  • Confidence intervals: 99.9% CI reported for all measurements

  • Sample sizes: 50 measurements per configuration (5 forks × 10 iterations)

  • Error margins: All measurements include ±confidence intervals

  • Coefficient of variation: Low variance indicates stable measurements

Throughput Results

Scenario 1: Pure Insert

Dataset: 10K Orders

Impl

Ops/s

Error ±

CI 99.9%

Rel Perf

ART

1,364.011

112.974

[1,251.037, 1,476.985]

Baseline

TreeSet

486.809

12.852

[473.957, 499.661]

-64.3%

Gap: +180.2% ART advantage

Dataset: 100K Orders

Impl

Ops/s

Error ±

CI 99.9%

Rel Perf

ART

155.118

3.543

[151.575, 158.661]

Baseline

TreeSet

47.445

1.598

[45.847, 49.043]

-69.4%

Gap: +227.0% ART advantage

Scenario 2: Pure Match

Dataset: 10K Orders

Impl

Ops/s

Error ±

CI 99.9%

Rel Perf

ART

2,040.202

33.832

[2,006.369, 2,074.034]

Baseline

TreeSet

1,365.038

54.959

[1,310.078, 1,419.997]

-33.1%

Gap: +49.5% ART advantage

Dataset: 100K Orders

Impl

Ops/s

Error ±

CI 99.9%

Rel Perf

ART

223.693

4.706

[218.987, 228.399]

Baseline

TreeSet

147.361

2.201

[145.160, 149.561]

-34.1%

Gap: +51.8% ART advantage

Tail Latency Analysis

Scenario

Impl

Mean (µs)

p99 (µs)

p99.9 (µs)

Insight

Random Mix

ART

7.13

19.6

85.6

Better tail latency

TreeSet

9.79

39.9

438.7

Slower, higher jitter

Partial Match

ART

1.15

4.28

63.3

Slower in this case

TreeSet

0.144

0.375

6.79

Dominates in low-latency

Hotspot

ART

6.04

22.7

207.7

Comparable

TreeSet

6.27

27.0

244.9

Slight edge in high percentiles

GC Profiling (Random Mix)

Impl

Alloc (B/op)

gc.count

gc.time (ms)

Insight

ART

8.47MB/op

2

3

~63% less allocation

TreeSet

13.81MB/op

1

2

Higher allocation per op

Conclusion

ART offers clear throughput and tail-latency wins in most scenarios, with strong statistical validity. TreeSet retains niche advantage in ultra-low-latency partial matches. Selection should be workload-driven.

You can browse the full implementation of Node256, OrderBook and benchmark code as well as final comprehensive report results with gc logs in this Github repo:

Keep Reading