Building Adaptive Radix Tree (ART) from Scratch: Node4 Internals and Bitwise Optimizations

Introduction
Node types
Domain Model Design: Understanding ArtNode4
Pseudocode: Inserting into ArtNode4
- Bitwise Optimizations
  - extractByteAtLevel: Routing Byte Extraction
  - tryBranching: Shared Prefix Detection and Node Spl …
Pseudocode: Retrieving from ArtNode4
- Bitwise Optimizations
  - Prefix Mismatch Shortcut
Full Implementation on GitHub

Introduction

Adaptive Radix Tree (ART) is a fast, space-efficient data structure designed to combine the cache-friendliness of tries with the compactness of binary search trees. Originally introduced in the paper “The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases” (VLDB 2013), ART dynamically adapts the node type depending on the number of children at each level.

Node types

Node4: Stores up to 4 children (linear search, compact)
Node16: Up to 16 children (binary search or SIMD scan)
Node48: Up to 48 children using an index array
Node256: Fully expanded, using direct indexing

Each node type is tuned for different key fan-out scenarios, allowing the tree to remain both shallow and efficient.

What We’re Building

In this post, we begin with the Node4 implementation — the smallest and most cache-friendly variant — as the foundational step for a complete ART. We will:

Use 64-bit long keys, byte-wise decomposed
Apply bit-level tuning and prefix compression
Build a clean structure called LongAdaptiveRadixMap around it

This is not just an academic exercise — our LongAdaptiveRadixMap will serve as the core in-memory data structure powering a high-performance matching engine (like those used in HFT exchanges or order books). From here, we’ll extend the work to support:

Other node types (Node16 → Node256)
SIMD optimizations with Rust implementation
Concurrency-safe design for production use

The Running Example

Value	7	6	5	4	3	2	1	0	hex literal
V₁	AB	CD	EF	12	34	56	78	90	`0xABCDEF1234567890`
V₂	AB	CD	EF	12	34	56	78	91	`0xABCDEF1234567891`
V₃	AB	CD	EF	12	34	56	78	92	`0xABCDEF1234567892`
V₄	AB	CD	EF	CD	AB	56	78	CD	`0xABCDEFCDAB5678CD`

Byte 0 = right‑most; byte 7 = left‑most.

Step‑by‑Step Construction

Insertion 1

Call	Action & Reason
`put(V₁)`	Tree is empty → create `ArtNode4` at level 0 with one child (0x90)

ArtNode4[level=0, key=0xabcdef1234567890, children=1]
├─ key[0]=0x90 → V1

Insertion 2

Call	Action & Reason
`put(V₂)`	Same 7-byte prefix, differs at byte 0 (0x91) → append new child, keep keys sorted

ArtNode4[level=0, key=0xabcdef1234567890, children=2]
├─ key[0]=0x90 → V1
├─ key[1]=0x91 → V2

Insertion 3

Call	Action & Reason
`put(V₃)`	Same logic → append third child at byte 0 (0x92)

ArtNode4[level=0, key=0xabcdef1234567890, children=3]
├─ key[0]=0x90 → V1
├─ key[1]=0x91 → V2
├─ key[2]=0x92 → V3

Insertion 4

Call	Action & Reason
`put(V₄)`	Prefix diverges at byte 4 and byte 0 → create `ArtNode4` at level 32, split path, reinsert subtree & new branch

ArtNode4[level=32, key=0xabcdef1234567890, children=2]
├─ key[0]=0x12
│  ArtNode4[level=0, key=0xabcdef1234567890, children=3]
│  ├─ key[0]=0x90 → V1
│  ├─ key[1]=0x91 → V2
│  ├─ key[2]=0x92 → V3
├─ key[1]=0xCD
│  ArtNode4[level=0, key=0xabcdefcdab5678cd, children=1]
│  ├─ key[0]=0xCD → V4

Domain Model Design: Understanding `ArtNode4`

To deeply understand how insertion and retrieval work in an Adaptive Radix Tree (ART), we must first understand the structure of a node — in this case, ArtNode4, which supports up to 4 children.

Each node maintains three core pieces of information:

1. `nodeLevel`: The Routing Level

Indicates which byte of the key this node routes on.
Each key is 8 bytes (for long), and nodeLevel is always a multiple of 8.
Example:
If nodeLevel = 16, it means we’re matching on the 3rd byte of the key (byte index = 16 / 8 = 2).

2. `prefix`: Shared Path for All Children

Every child node must share a common prefix up to the nodeLevel.
The prefix is used for path compression, so the tree stays compact.
Example:
If a node is at nodeLevel = 24 and its prefix is 0x123456, all children must have keys starting with 0x123456.

3. `keys[]` and `nodes[]`: Routing Table

To map the next byte to a child, we store:

short[] keys = new short[4];     // Routing byte at this level
Object[] nodes = new Object[4];  // Either value or another ArtNode
int numChildren = 0;             // Tracks how many slots are filled

Why not use a dynamic list? Because:

Lists are slow to search
Inserting into a sorted list is expensive
Fixed arrays are more cache-friendly and zero-GC

Instead, we use:

A fixed-size array of 4 slots
An explicit numChildren count
Children are always kept in sorted order, so search and insertion are efficient

Insertion Behavior (Sorted Insert)

When inserting a new child, the keyByte is compared to existing keys[] from left to right:

Case 1: keyByte < existingKey

→ Shift all entries right and insert here

Case 2: keyByte == existingKey

→ Already exists, traverse deeper or update value

Case 3: keyByte > all existing

→ Insert at the end (if space allows)

Example:

Existing keys:  [0x12, 0x34]
Insert keyByte: 0x20

Result: [0x12, 0x20, 0x34] // inserted in sorted order

Pseudocode: Inserting into ArtNode4

function insert(key, level, value):
    if level != node.level:
        // The key doesn't belong in this node
        branchedNode = tryBranching(key, value, this)
        if branchedNode exists:
            return branchedNode

    routingByte = extractByteAtLevel(key, node.level)

    i = findInsertionPoint(key, value, routingByte)

    if node.numChildren < 4:
        shiftAndInsert(i, key, value, routingByte)
        return null
    else:
        return growToNode16(key, value, routingByte)

Bitwise Optimizations

`extractByteAtLevel`: Routing Byte Extraction

Purpose: Get the byte at a specific level of the key for routing.

(short) ((key >>> nodeLevel) & 0xFF)

Why it’s fast:

No slicing or intermediate object creation.
Single shift and mask extracts exact byte needed in O(1).

Example:

key        = 0xABCDEF1234567890
nodeLevel  = 16
key >>> 16 = 0xABCDEF123456
& 0xFF     = 0x56 (the byte we need)

`tryBranching`: Shared Prefix Detection and Node Split

keyDiff = key ^ nodeKey

if ((keyDiff & (-1L << nodeLevel)) == 0):
    return null  // Same prefix, no split needed

newLevel = (63 - numberOfLeadingZeros(keyDiff)) & 0xF8

Explain:

^ quickly finds differing bits.
numberOfLeadingZeros finds the first differing bit.
& 0xF8 aligns to the nearest lower byte boundary (routing happens at byte granularity).
No need to compare bytes manually.

Example:

keyA = 0xABCDEF1234567890
keyB = 0xABCDEF123456ABCD
→ keyDiff = 0x000000000000D35D
→ numberOfLeadingZeros = 52
→ newLevel = (63 - 52) & 0xF8 = 8
→ Split at byte 1 (level = 8)

Pseudocode: Retrieving from ArtNode4

function get(key, level):
    if level != node.level and key doesn't match node's prefix:
        return null

    routingByte = extractByteAtLevel(key, node.level)

    for i in 0..numChildren:
        if routingByte == keys[i]:
            return (node.level == 0)
                ? value
                : childNode[i].get(key, node.level - 8)
        if routingByte < keys[i]:
            break  // early exit
    return null

Bitwise Optimizations

Prefix Mismatch Shortcut

if (level != nodeLevel && ((key ^ nodeKey) & (-1L << (nodeLevel + 8))) != 0)
    return null;

What's happening:

(key ^ nodeKey) isolates differing bits
(-1L << (nodeLevel + 8)) creates a mask of bits at higher levels than this node
If masked result is non-zero → means prefix differs at some higher level

Example:

key     = 0xABCDEF1234567890
nodeKey = 0xABCDEF123456ABCD
nodeLevel = 8

mask = (-1L << 16) = 0xFFFFFFFFFFFF0000

(key ^ nodeKey) = 0x000000000000D35D
→ & mask        = 0x000000000000D000 ≠ 0 → return null

Why it’s fast:

No loop or byte-by-byte prefix scan
One XOR + shift + mask = prefix check in constant time

Full Implementation on GitHub

You can browse the full implementation, including optimized insert and get methods, in this GitHub repo:

View Source on GitHub

Building Adaptive Radix Tree (ART) from Scratch: Node4 Internals and Bitwise Optimizations

Table of Contents

Introduction

What We’re Building

The Running Example

Step‑by‑Step Construction

Insertion 1

Insertion 2

Insertion 3

Insertion 4

Domain Model Design: Understanding `ArtNode4`

1. `nodeLevel`: The Routing Level

2. `prefix`: Shared Path for All Children

3. `keys[]` and `nodes[]`: Routing Table

Pseudocode: Inserting into ArtNode4

Bitwise Optimizations

`extractByteAtLevel`: Routing Byte Extraction

`tryBranching`: Shared Prefix Detection and Node Split

Pseudocode: Retrieving from ArtNode4

Bitwise Optimizations

Prefix Mismatch Shortcut

Full Implementation on GitHub

Keep Reading

Sub-Microseconds

Home

Building Adaptive Radix Tree (ART) from Scratch: Node4 Internals and Bitwise Optimizations

Table of Contents

Introduction

What We’re Building

The Running Example

Step‑by‑Step Construction

Insertion 1

Insertion 2

Insertion 3

Insertion 4

Domain Model Design: Understanding ArtNode4

1. nodeLevel: The Routing Level

2. prefix: Shared Path for All Children

3. keys[] and nodes[]: Routing Table

Pseudocode: Inserting into ArtNode4

Bitwise Optimizations

extractByteAtLevel: Routing Byte Extraction

tryBranching: Shared Prefix Detection and Node Split

Pseudocode: Retrieving from ArtNode4

Bitwise Optimizations

Prefix Mismatch Shortcut

Full Implementation on GitHub

Keep Reading

Sub-Microseconds

Home

Domain Model Design: Understanding `ArtNode4`

1. `nodeLevel`: The Routing Level

2. `prefix`: Shared Path for All Children

3. `keys[]` and `nodes[]`: Routing Table

`extractByteAtLevel`: Routing Byte Extraction

`tryBranching`: Shared Prefix Detection and Node Split