Table of Contents

Introduction

Adaptive Radix Tree (ART) is a fast, space-efficient data structure designed to combine the cache-friendliness of tries with the compactness of binary search trees. Originally introduced in the paper “The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases” (VLDB 2013), ART dynamically adapts the node type depending on the number of children at each level.

Node types

  • Node4: Stores up to 4 children (linear search, compact)

  • Node16: Up to 16 children (binary search or SIMD scan)

  • Node48: Up to 48 children using an index array

  • Node256: Fully expanded, using direct indexing

Each node type is tuned for different key fan-out scenarios, allowing the tree to remain both shallow and efficient.

What We’re Building

In this post, we begin with the Node4 implementation — the smallest and most cache-friendly variant — as the foundational step for a complete ART. We will:

  • Use 64-bit long keys, byte-wise decomposed

  • Apply bit-level tuning and prefix compression

  • Build a clean structure called LongAdaptiveRadixMap around it

This is not just an academic exercise — our LongAdaptiveRadixMap will serve as the core in-memory data structure powering a high-performance matching engine (like those used in HFT exchanges or order books). From here, we’ll extend the work to support:

  • Other node types (Node16 → Node256)

  • SIMD optimizations with Rust implementation

  • Concurrency-safe design for production use

The Running Example

Value

7

6

5

4

3

2

1

0

hex literal

V₁

AB

CD

EF

12

34

56

78

90

0xABCDEF1234567890

V₂

AB

CD

EF

12

34

56

78

91

0xABCDEF1234567891

V₃

AB

CD

EF

12

34

56

78

92

0xABCDEF1234567892

V₄

AB

CD

EF

CD

AB

56

78

CD

0xABCDEFCDAB5678CD

Byte 0 = right‑most; byte 7 = left‑most.

Step‑by‑Step Construction

Insertion 1

Call

Action & Reason

put(V₁)

Tree is empty → create ArtNode4 at level 0 with one child (0x90)

ArtNode4[level=0, key=0xabcdef1234567890, children=1]
├─ key[0]=0x90 → V1

Insertion 2

Call

Action & Reason

put(V₂)

Same 7-byte prefix, differs at byte 0 (0x91) → append new child, keep keys sorted

ArtNode4[level=0, key=0xabcdef1234567890, children=2]
├─ key[0]=0x90 → V1
├─ key[1]=0x91 → V2

Insertion 3

Call

Action & Reason

put(V₃)

Same logic → append third child at byte 0 (0x92)

ArtNode4[level=0, key=0xabcdef1234567890, children=3]
├─ key[0]=0x90 → V1
├─ key[1]=0x91 → V2
├─ key[2]=0x92 → V3

Insertion 4

Call

Action & Reason

put(V₄)

Prefix diverges at byte 4 and byte 0 → create ArtNode4 at level 32, split path, reinsert subtree & new branch

ArtNode4[level=32, key=0xabcdef1234567890, children=2]
├─ key[0]=0x12
│  ArtNode4[level=0, key=0xabcdef1234567890, children=3]
│  ├─ key[0]=0x90 → V1
│  ├─ key[1]=0x91 → V2
│  ├─ key[2]=0x92 → V3
├─ key[1]=0xCD
│  ArtNode4[level=0, key=0xabcdefcdab5678cd, children=1]
│  ├─ key[0]=0xCD → V4

Domain Model Design: Understanding ArtNode4

To deeply understand how insertion and retrieval work in an Adaptive Radix Tree (ART), we must first understand the structure of a node — in this case, ArtNode4, which supports up to 4 children.

Each node maintains three core pieces of information:

1. nodeLevel: The Routing Level

  • Indicates which byte of the key this node routes on.

  • Each key is 8 bytes (for long), and nodeLevel is always a multiple of 8.

  • Example:
    If nodeLevel = 16, it means we’re matching on the 3rd byte of the key (byte index = 16 / 8 = 2).

2. prefix: Shared Path for All Children

  • Every child node must share a common prefix up to the nodeLevel.

  • The prefix is used for path compression, so the tree stays compact.

  • Example:
    If a node is at nodeLevel = 24 and its prefix is 0x123456, all children must have keys starting with 0x123456.

3. keys[] and nodes[]: Routing Table

To map the next byte to a child, we store:

short[] keys = new short[4];     // Routing byte at this level
Object[] nodes = new Object[4];  // Either value or another ArtNode
int numChildren = 0;             // Tracks how many slots are filled

Why not use a dynamic list? Because:

  • Lists are slow to search

  • Inserting into a sorted list is expensive

  • Fixed arrays are more cache-friendly and zero-GC

Instead, we use:

  • A fixed-size array of 4 slots

  • An explicit numChildren count

  • Children are always kept in sorted order, so search and insertion are efficient

Insertion Behavior (Sorted Insert)

When inserting a new child, the keyByte is compared to existing keys[] from left to right:

Case 1: keyByte < existingKey

→ Shift all entries right and insert here

Case 2: keyByte == existingKey

→ Already exists, traverse deeper or update value

Case 3: keyByte > all existing

→ Insert at the end (if space allows)

Example:

Existing keys:  [0x12, 0x34]
Insert keyByte: 0x20

Result: [0x12, 0x20, 0x34] // inserted in sorted order

Pseudocode: Inserting into ArtNode4

function insert(key, level, value):
    if level != node.level:
        // The key doesn't belong in this node
        branchedNode = tryBranching(key, value, this)
        if branchedNode exists:
            return branchedNode

    routingByte = extractByteAtLevel(key, node.level)

    i = findInsertionPoint(key, value, routingByte)

    if node.numChildren < 4:
        shiftAndInsert(i, key, value, routingByte)
        return null
    else:
        return growToNode16(key, value, routingByte)

Bitwise Optimizations

extractByteAtLevel: Routing Byte Extraction

Purpose: Get the byte at a specific level of the key for routing.

(short) ((key >>> nodeLevel) & 0xFF)

Why it’s fast:

  • No slicing or intermediate object creation.

  • Single shift and mask extracts exact byte needed in O(1).

Example:

key        = 0xABCDEF1234567890
nodeLevel  = 16
key >>> 16 = 0xABCDEF123456
& 0xFF     = 0x56 (the byte we need)

tryBranching: Shared Prefix Detection and Node Split

keyDiff = key ^ nodeKey

if ((keyDiff & (-1L << nodeLevel)) == 0):
    return null  // Same prefix, no split needed

newLevel = (63 - numberOfLeadingZeros(keyDiff)) & 0xF8

Explain:

  • ^ quickly finds differing bits.

  • numberOfLeadingZeros finds the first differing bit.

  • & 0xF8 aligns to the nearest lower byte boundary (routing happens at byte granularity).

  • No need to compare bytes manually.

Example:

keyA = 0xABCDEF1234567890
keyB = 0xABCDEF123456ABCD
→ keyDiff = 0x000000000000D35D
→ numberOfLeadingZeros = 52
→ newLevel = (63 - 52) & 0xF8 = 8
→ Split at byte 1 (level = 8)

Pseudocode: Retrieving from ArtNode4

function get(key, level):
    if level != node.level and key doesn't match node's prefix:
        return null

    routingByte = extractByteAtLevel(key, node.level)

    for i in 0..numChildren:
        if routingByte == keys[i]:
            return (node.level == 0)
                ? value
                : childNode[i].get(key, node.level - 8)
        if routingByte < keys[i]:
            break  // early exit
    return null

Bitwise Optimizations

Prefix Mismatch Shortcut

if (level != nodeLevel && ((key ^ nodeKey) & (-1L << (nodeLevel + 8))) != 0)
    return null;

What's happening:

  • (key ^ nodeKey) isolates differing bits

  • (-1L << (nodeLevel + 8)) creates a mask of bits at higher levels than this node

  • If masked result is non-zero → means prefix differs at some higher level

Example:

key     = 0xABCDEF1234567890
nodeKey = 0xABCDEF123456ABCD
nodeLevel = 8

mask = (-1L << 16) = 0xFFFFFFFFFFFF0000

(key ^ nodeKey) = 0x000000000000D35D
 & mask        = 0x000000000000D000  0  return null

Why it’s fast:

  • No loop or byte-by-byte prefix scan

  • One XOR + shift + mask = prefix check in constant time

Full Implementation on GitHub

You can browse the full implementation, including optimized insert and get methods, in this GitHub repo:

Keep Reading