Table of Contents
Introduction
Adaptive Radix Tree (ART) is a fast, space-efficient data structure designed to combine the cache-friendliness of tries with the compactness of binary search trees. Originally introduced in the paper “The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases” (VLDB 2013), ART dynamically adapts the node type depending on the number of children at each level.
Node types
Node4: Stores up to 4 children (linear search, compact)
Node16: Up to 16 children (binary search or SIMD scan)
Node48: Up to 48 children using an index array
Node256: Fully expanded, using direct indexing
Each node type is tuned for different key fan-out scenarios, allowing the tree to remain both shallow and efficient.
What We’re Building
In this post, we begin with the Node4 implementation — the smallest and most cache-friendly variant — as the foundational step for a complete ART. We will:
Use 64-bit
longkeys, byte-wise decomposedApply bit-level tuning and prefix compression
Build a clean structure called
LongAdaptiveRadixMaparound it
This is not just an academic exercise — our LongAdaptiveRadixMap will serve as the core in-memory data structure powering a high-performance matching engine (like those used in HFT exchanges or order books). From here, we’ll extend the work to support:
Other node types (Node16 → Node256)
SIMD optimizations with Rust implementation
Concurrency-safe design for production use
The Running Example
Value | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | hex literal |
|---|---|---|---|---|---|---|---|---|---|
V₁ | AB | CD | EF | 12 | 34 | 56 | 78 | 90 |
|
V₂ | AB | CD | EF | 12 | 34 | 56 | 78 | 91 |
|
V₃ | AB | CD | EF | 12 | 34 | 56 | 78 | 92 |
|
V₄ | AB | CD | EF | CD | AB | 56 | 78 | CD |
|
Byte 0 = right‑most; byte 7 = left‑most.
Step‑by‑Step Construction
Insertion 1
Call | Action & Reason |
|---|---|
| Tree is empty → create |
ArtNode4[level=0, key=0xabcdef1234567890, children=1]
├─ key[0]=0x90 → V1
Insertion 2
Call | Action & Reason |
|---|---|
| Same 7-byte prefix, differs at byte 0 (0x91) → append new child, keep keys sorted |
ArtNode4[level=0, key=0xabcdef1234567890, children=2]
├─ key[0]=0x90 → V1
├─ key[1]=0x91 → V2
Insertion 3
Call | Action & Reason |
|---|---|
| Same logic → append third child at byte 0 (0x92) |
ArtNode4[level=0, key=0xabcdef1234567890, children=3]
├─ key[0]=0x90 → V1
├─ key[1]=0x91 → V2
├─ key[2]=0x92 → V3
Insertion 4
Call | Action & Reason |
|---|---|
| Prefix diverges at byte 4 and byte 0 → create |
ArtNode4[level=32, key=0xabcdef1234567890, children=2]
├─ key[0]=0x12
│ ArtNode4[level=0, key=0xabcdef1234567890, children=3]
│ ├─ key[0]=0x90 → V1
│ ├─ key[1]=0x91 → V2
│ ├─ key[2]=0x92 → V3
├─ key[1]=0xCD
│ ArtNode4[level=0, key=0xabcdefcdab5678cd, children=1]
│ ├─ key[0]=0xCD → V4
Domain Model Design: Understanding ArtNode4
To deeply understand how insertion and retrieval work in an Adaptive Radix Tree (ART), we must first understand the structure of a node — in this case, ArtNode4, which supports up to 4 children.
Each node maintains three core pieces of information:
1. nodeLevel: The Routing Level
Indicates which byte of the key this node routes on.
Each key is 8 bytes (for
long), andnodeLevelis always a multiple of 8.Example:
IfnodeLevel = 16, it means we’re matching on the 3rd byte of the key (byte index =16 / 8 = 2).
Every child node must share a common prefix up to the
nodeLevel.The
prefixis used for path compression, so the tree stays compact.Example:
If a node is atnodeLevel = 24and its prefix is0x123456, all children must have keys starting with0x123456.
3. keys[] and nodes[]: Routing Table
To map the next byte to a child, we store:
short[] keys = new short[4]; // Routing byte at this level
Object[] nodes = new Object[4]; // Either value or another ArtNode
int numChildren = 0; // Tracks how many slots are filled
Why not use a dynamic list? Because:
Lists are slow to search
Inserting into a sorted list is expensive
Fixed arrays are more cache-friendly and zero-GC
Instead, we use:
A fixed-size array of 4 slots
An explicit
numChildrencountChildren are always kept in sorted order, so search and insertion are efficient
Insertion Behavior (Sorted Insert)
When inserting a new child, the keyByte is compared to existing keys[] from left to right:
Case 1: keyByte < existingKey
→ Shift all entries right and insert here
Case 2: keyByte == existingKey
→ Already exists, traverse deeper or update value
Case 3: keyByte > all existing
→ Insert at the end (if space allows)
Example:
Existing keys: [0x12, 0x34]
Insert keyByte: 0x20
Result: [0x12, 0x20, 0x34] // inserted in sorted order
Pseudocode: Inserting into ArtNode4
function insert(key, level, value):
if level != node.level:
// The key doesn't belong in this node
branchedNode = tryBranching(key, value, this)
if branchedNode exists:
return branchedNode
routingByte = extractByteAtLevel(key, node.level)
i = findInsertionPoint(key, value, routingByte)
if node.numChildren < 4:
shiftAndInsert(i, key, value, routingByte)
return null
else:
return growToNode16(key, value, routingByte)
Bitwise Optimizations
extractByteAtLevel: Routing Byte Extraction
Purpose: Get the byte at a specific level of the key for routing.
(short) ((key >>> nodeLevel) & 0xFF)
Why it’s fast:
No slicing or intermediate object creation.
Single shift and mask extracts exact byte needed in O(1).
Example:
key = 0xABCDEF1234567890
nodeLevel = 16
key >>> 16 = 0xABCDEF123456
& 0xFF = 0x56 (the byte we need)
keyDiff = key ^ nodeKey
if ((keyDiff & (-1L << nodeLevel)) == 0):
return null // Same prefix, no split needed
newLevel = (63 - numberOfLeadingZeros(keyDiff)) & 0xF8
Explain:
^quickly finds differing bits.numberOfLeadingZerosfinds the first differing bit.& 0xF8aligns to the nearest lower byte boundary (routing happens at byte granularity).No need to compare bytes manually.
Example:
keyA = 0xABCDEF1234567890
keyB = 0xABCDEF123456ABCD
→ keyDiff = 0x000000000000D35D
→ numberOfLeadingZeros = 52
→ newLevel = (63 - 52) & 0xF8 = 8
→ Split at byte 1 (level = 8)
Pseudocode: Retrieving from ArtNode4
function get(key, level):
if level != node.level and key doesn't match node's prefix:
return null
routingByte = extractByteAtLevel(key, node.level)
for i in 0..numChildren:
if routingByte == keys[i]:
return (node.level == 0)
? value
: childNode[i].get(key, node.level - 8)
if routingByte < keys[i]:
break // early exit
return null
Bitwise Optimizations
Prefix Mismatch Shortcut
if (level != nodeLevel && ((key ^ nodeKey) & (-1L << (nodeLevel + 8))) != 0)
return null;
What's happening:
(key ^ nodeKey)isolates differing bits(-1L << (nodeLevel + 8))creates a mask of bits at higher levels than this nodeIf masked result is non-zero → means prefix differs at some higher level
Example:
key = 0xABCDEF1234567890
nodeKey = 0xABCDEF123456ABCD
nodeLevel = 8
mask = (-1L << 16) = 0xFFFFFFFFFFFF0000
(key ^ nodeKey) = 0x000000000000D35D
→ & mask = 0x000000000000D000 ≠ 0 → return null
Why it’s fast:
No loop or byte-by-byte prefix scan
One XOR + shift + mask = prefix check in constant time
Full Implementation on GitHub
You can browse the full implementation, including optimized insert and get methods, in this GitHub repo:
