Post 11A — Revision-Based Snapshots

The Problem Statement

Our authorization model from Posts 1-10 is powerful, deterministic, and fail-safe. But there's a critical requirement we haven't addressed: consistency across distributed storage. In distributed systems, data is replicated across multiple nodes, and reads might see different versions of the data. Without snapshot isolation, we risk:

Inconsistent decisions: Check sees partial writes, grants access incorrectly
EXCLUSION anomalies: Exclusion sees different snapshots, produces wrong results
Non-reproducible audits: Can't reproduce historical authorization decisions
Race conditions: Concurrent writes interfere with reads

Real-world examples from production systems:

Looking at how distributed authorization systems ensure consistency reveals critical requirements:

Google Zanzibar: Snapshot reads with Zookie tokens for consistency
AWS IAM: Eventually consistent with strong consistency option
Auth0 FGA: Consistency tokens for snapshot isolation
Ory Keto: Revision-based reads for time-travel queries

Common patterns that require consistency:

Snapshot isolation: All reads in a Check must see the same snapshot
Time-travel queries: Reproduce historical authorization decisions
EXCLUSION correctness: Exclusion must see consistent state
Audit compliance: Must prove what data was visible at decision time

What we can't guarantee today:

❝

"All tuple reads during a Check evaluation must come from the same consistent snapshot of the data."

Our current model has gaps:

❌ No snapshot isolation — reads might see partial writes
❌ No revision tracking — can't reproduce historical decisions
❌ No consistency levels — can't trade latency for consistency
❌ EXCLUSION sees inconsistent state — produces wrong results

The core problem: We need revision-based snapshots and consistency levels that ensure correct evaluation across distributed storage.

Human Rule

❝

"All reads in a Check must see the same snapshot."

Every tuple read during authorization evaluation must come from the same revision, ensuring consistent results even when concurrent writes occur.

Why Current Models Don't Work

Let's try to ensure consistency using our existing tools from Posts 1-10.

What We Have So Far (Post 1-10 Recap)

From previous posts, we have:

From Post 1: Tuple storage:

func WriteTuple(tuple *RelationTuple) {
    storage.Write(tuple)  // ← No revision tracking!
}

func ReadTuples(resource, subject) []*RelationTuple {
    return storage.Read(resource, subject)  // ← Which version?
}

From Post 7: EXCLUSION operation:

// EXCLUSION: base - exclusion
func EvaluateExclusion(base, exclusion) {
    baseResult := Evaluate(base)      // ← Read 1
    exclusionResult := Evaluate(exclusion)  // ← Read 2
    // What if data changed between Read 1 and Read 2? ❌
}

From Post 4: Graph traversal:

func Expand(resource, relation) {
    tuples := ReadTuples(resource, relation)  // ← Which snapshot?
    for tuple := range tuples {
        Expand(tuple.Subject, relation)  // ← Same snapshot?
    }
}

What's missing: No revision tracking, no snapshot isolation, no consistency guarantees!

Approach 1: Read Latest (Inconsistent)

// Attempt: Always read latest data

func Check(resource, subject, context) bool {
    // Read 1: Check direct tuples
    tuples1 := storage.ReadLatest(resource, subject)
    
    // Concurrent write happens here! ✍️
    
    // Read 2: Expand graph
    tuples2 := storage.ReadLatest(resource, "*")
    
    // Reads see different snapshots! ❌
}

Problems:

❌ Inconsistent: Reads see different snapshots
❌ EXCLUSION broken: Base and exclusion see different data
❌ Non-reproducible: Can't reproduce historical decisions
❌ Race conditions: Concurrent writes interfere

Example failure:

// Initial state: user:alice → document:1#viewer

// Check starts
Check(document:1#viewer, user:alice, ctx)
// → Read 1: Find user:alice → document:1#viewer ✅

// Concurrent write: Remove alice, add bob
WriteTuple(user:bob → document:1#viewer)
DeleteTuple(user:alice → document:1#viewer)

// Check continues
// → Read 2: Expand graph (sees bob, not alice)
// → Inconsistent state! ❌

Approach 2: Locks (Slow and Fragile)

// Attempt: Lock data during Check

func Check(resource, subject, context) bool {
    lock := storage.AcquireLock(resource)
    defer lock.Release()
    
    // All reads while holding lock
    result := EvaluateWithLock(resource, subject, context)
    return result
}

Problems:

❌ Slow: Locks block concurrent reads and writes
❌ Deadlocks: Multiple Checks can deadlock
❌ Scalability: Doesn't scale to distributed systems
❌ Availability: Lock failures break the system

Conclusion: We need revision-based snapshots that provide isolation without locks.

The Solution: Revision-Based Snapshots

We introduce a revision model that tracks the version of the data and enables snapshot isolation.

Part 1: Revision Model

Every write operation increments a global revision counter, and every tuple is tagged with the revision at which it was written.

type Revision struct {
    Counter int64  // Monotonically increasing counter
}

type RelationTuple struct {
    Resource ObjectAndRelation
    Subject  Subject
    Revision Revision  // ← Revision at which tuple was written
}

type TupleStorage interface {
    // Write tuple and return new revision
    WriteTuple(tuple *RelationTuple) (Revision, error)

    // Read tuples as of specific revision
    ReadTuplesAtRevision(resource, subject ObjectAndRelation, revision Revision) ([]*RelationTuple, error)

    // Get current revision (latest)
    GetCurrentRevision() (Revision, error)
}

Key properties:

✅ Monotonic: Revisions always increase
✅ Global: Single revision counter across all tuples
✅ Immutable: Tuples are immutable once written
✅ Snapshot: Can read all tuples as of specific revision

Example:

// Initial state: revision 100
// Tuples: []

// Write 1: Add alice
WriteTuple(user:alice → document:1#viewer)
// → Revision 101
// → Tuples: [(user:alice → document:1#viewer, rev=101)]

// Write 2: Add bob
WriteTuple(user:bob → document:1#viewer)
// → Revision 102
// → Tuples: [(user:alice, rev=101), (user:bob, rev=102)]

// Read at revision 101
ReadTuplesAtRevision(document:1#viewer, *, revision=101)
// → Returns: [(user:alice, rev=101)]
// → bob not visible (written at rev=102)

// Read at revision 102
ReadTuplesAtRevision(document:1#viewer, *, revision=102)
// → Returns: [(user:alice, rev=101), (user:bob, rev=102)]
// → Both visible

Part 2: Snapshot Isolation

All reads during a Check evaluation must use the same revision.

func Check(resource, subject, context) bool {
    // Step 1: Choose snapshot revision
    revision := storage.GetCurrentRevision()

    // Step 2: Evaluate using snapshot
    return CheckAtRevision(resource, subject, context, revision)
}

func CheckAtRevision(resource, subject, context, revision) bool {
    // All reads use the same revision
    tuples := storage.ReadTuplesAtRevision(resource, subject, revision)

    for tuple := range tuples {
        // Recursive calls also use same revision
        result := CheckAtRevision(tuple.Subject, ..., revision)
        // ...
    }

    // ...
}

Guarantees:

✅ Consistent reads: All reads see same snapshot
✅ EXCLUSION correctness: Base and exclusion see same data
✅ Reproducible: Can reproduce decision by replaying at same revision
✅ Isolation: Concurrent writes don't interfere

Example:

// Initial state at revision 100:
// user:alice → document:1#viewer

// Check starts
Check(document:1#viewer, user:alice, ctx)
// → Choose snapshot: revision 100
// → Read at revision 100: [user:alice]

// Concurrent write at revision 101
WriteTuple(user:bob → document:1#viewer)
DeleteTuple(user:alice → document:1#viewer)

// Check continues at revision 100
// → Read at revision 100: [user:alice]
// → Write at revision 101 not visible
// → Consistent snapshot ✅

Part 3: Consistency Levels

Different use cases require different consistency guarantees. We define four consistency levels:

type ConsistencyRequirement int32

const (
    // Read from any replica (fastest, may be stale)
    MINIMIZE_LATENCY ConsistencyRequirement = iota

    // Read from replica with revision >= specified revision
    AT_LEAST_AS_FRESH

    // Read from exact revision (time-travel)
    AT_EXACT_SNAPSHOT

    // Read from leader (strongest consistency)
    FULLY_CONSISTENT
)

Level 1: MINIMIZE_LATENCY (Fastest)

// Read from any replica, accept stale data
func Check(resource, subject, context) bool {
    revision := anyReplica.GetCurrentRevision()  // ← May be stale
    return CheckAtRevision(resource, subject, context, revision)
}

Use cases:

Non-critical checks where latency matters more than freshness
Read-heavy workloads where eventual consistency is acceptable
Caching scenarios where stale data is tolerable

Guarantees:

✅ Snapshot isolation: All reads in single Check use same revision
❌ Freshness: May read stale data (old revision)

Level 2: AT_LEAST_AS_FRESH (Bounded Staleness)

// Read from replica with revision >= specified revision
func CheckAtLeastAsFresh(resource, subject, context, minRevision) bool {
    // Wait for replica to catch up to minRevision
    replica := findReplicaWithRevision(minRevision)
    revision := replica.GetCurrentRevision()  // ← revision >= minRevision
    return CheckAtRevision(resource, subject, context, revision)
}

Use cases:

Read-after-write consistency: Check sees own writes
Bounded staleness: Tolerate some lag but not too much
Multi-region deployments with regional replicas

Guarantees:

✅ Snapshot isolation: All reads use same revision
✅ Bounded staleness: Revision >= minRevision
❌ Exact snapshot: May read newer data than minRevision

Example:

// User writes tuple
revision := WriteTuple(user:alice → document:1#viewer)
// → revision = 100

// User immediately checks access
CheckAtLeastAsFresh(document:1#viewer, user:alice, ctx, revision=100)
// → Waits for replica with revision >= 100
// → Reads from revision 100 (or newer)
// → Sees own write ✅

Level 3: AT_EXACT_SNAPSHOT (Time-Travel)

// Read from exact revision (historical query)
func CheckAtExactSnapshot(resource, subject, context, exactRevision) bool {
    return CheckAtRevision(resource, subject, context, exactRevision)
}

Use cases:

Audit compliance: Reproduce historical authorization decisions
Debugging: Investigate what data was visible at specific time
Time-travel queries: "Who had access to document:1 at revision 50?"

Guarantees:

✅ Snapshot isolation: All reads use same revision
✅ Exact snapshot: Reads from exactRevision, not newer
✅ Reproducible: Same revision always produces same result

Example:

// Audit: "Did user:alice have access to document:1 at revision 50?"
result := CheckAtExactSnapshot(document:1#viewer, user:alice, ctx, revision=50)
// → Reads all tuples as of revision 50
// → Evaluates using only data visible at revision 50
// → Reproducible: same result every time ✅

Level 4: FULLY_CONSISTENT (Strongest)

// Read from leader (linearizable)
func CheckFullyConsistent(resource, subject, context) bool {
    revision := leader.GetCurrentRevision()  // ← Read from leader
    return CheckAtRevision(resource, subject, context, revision)
}

Use cases:

Critical checks where stale data is unacceptable
Compliance requirements for strong consistency
Scenarios where correctness matters more than latency

Guarantees:

✅ Snapshot isolation: All reads use same revision
✅ Linearizable: Reads from leader, sees all committed writes
✅ Strongest consistency: No stale reads

Trade-off: Higher latency (must contact leader), lower availability (leader failure blocks reads).

Part 4: Revision Semantics for EXCLUSION

EXCLUSION operations require special care to ensure correctness across revisions.

Rule: Base and exclusion must be evaluated at the same revision.

func EvaluateExclusion(base, exclusion, revision) bool {
    // Both use same revision
    baseResult := EvaluateAtRevision(base, revision)
    exclusionResult := EvaluateAtRevision(exclusion, revision)

    // EXCLUSION: base AND NOT exclusion
    return baseResult && !exclusionResult
}

Why this matters:

// Initial state at revision 100:
// - user:alice → document:1#viewer (granted)
// - user:alice → document:1#banned (not present)

// Write at revision 101:
// - user:alice → document:1#banned (banned)

// Check at revision 100 (before ban):
EvaluateExclusion(viewer, banned, revision=100)
// → viewer: TRUE (user:alice is viewer)
// → banned: FALSE (user:alice not banned at revision 100)
// → EXCLUSION: TRUE AND NOT FALSE = TRUE
// → Access granted ✅

// Check at revision 101 (after ban):
EvaluateExclusion(viewer, banned, revision=101)
// → viewer: TRUE (user:alice is viewer)
// → banned: TRUE (user:alice banned at revision 101)
// → EXCLUSION: TRUE AND NOT TRUE = FALSE
// → Access denied ✅

// Both correct! ✅

Without snapshot isolation:

// Check starts at revision 100
// → Read viewer tuples at revision 100: user:alice (granted)
// → Write happens: revision 101 (user:alice banned)
// → Read banned tuples at revision 101: user:alice (banned)
// → EXCLUSION: TRUE AND NOT TRUE = FALSE
// → Access denied ❌

// But at revision 100, user:alice was NOT banned!
// Should have granted access ✅
// → INCONSISTENT! ❌

Real-World Example: Multi-Region Document Sharing

Let's walk through a realistic scenario with distributed replicas and different consistency levels.

Initial State (Revision 100)

// US region (leader): revision 100
// - user:alice → document:1#viewer

// EU region (replica): revision 100 (fully caught up)
// - user:alice → document:1#viewer

// Asia region (replica): revision 95 (5 revisions behind)
// - (no tuples yet)

Write Operation (Revision 101)

// User in US writes new tuple
WriteTuple(user:bob → document:1#viewer)
// → Written to leader at revision 101
// → Replication starts to EU and Asia

// US region (leader): revision 101
// - user:alice → document:1#viewer (rev 100)
// - user:bob → document:1#viewer (rev 101)

// EU region (replica): revision 100 (replicating...)
// - user:alice → document:1#viewer (rev 100)

// Asia region (replica): revision 95 (still behind)
// - (no tuples yet)

Check with MINIMIZE_LATENCY

// User in Asia checks access
Check(document:1#viewer, user:bob, ctx)
// → Consistency: MINIMIZE_LATENCY
// → Read from nearest replica (Asia)
// → Asia replica at revision 95
// → user:bob not found (written at rev 101)
// → Access denied ❌ (stale read)

// This is EXPECTED with MINIMIZE_LATENCY!
// Latency optimized, freshness sacrificed.

Check with AT_LEAST_AS_FRESH

// User writes tuple and gets revision
revision := WriteTuple(user:bob → document:1#viewer)
// → revision = 101

// User immediately checks access
CheckAtLeastAsFresh(document:1#viewer, user:bob, ctx, minRevision=101)
// → Consistency: AT_LEAST_AS_FRESH
// → Wait for replica with revision >= 101
// → Asia replica at revision 95 (too old)
// → EU replica at revision 100 (too old)
// → US leader at revision 101 (OK!)
// → Read from US leader
// → user:bob found ✅
// → Access granted ✅ (read-after-write consistency)

Check with AT_EXACT_SNAPSHOT

// Audit query: "Who had access at revision 100?"
CheckAtExactSnapshot(document:1#viewer, user:bob, ctx, exactRevision=100)
// → Read from any replica with revision >= 100
// → EU replica at revision 100 (OK!)
// → Read tuples as of revision 100
// → user:bob not found (written at rev 101)
// → Access denied ✅ (correct historical answer)

CheckAtExactSnapshot(document:1#viewer, user:alice, ctx, exactRevision=100)
// → Read tuples as of revision 100
// → user:alice found (written at rev 100)
// → Access granted ✅ (correct historical answer)

Check with FULLY_CONSISTENT

// Critical check requiring strongest consistency
CheckFullyConsistent(document:1#viewer, user:bob, ctx)
// → Consistency: FULLY_CONSISTENT
// → Read from leader (US)
// → US leader at revision 101
// → user:bob found ✅
// → Access granted ✅ (highest latency, but strongest consistency)

Takeaways

Revision-based snapshots ensure consistency — All reads during a Check use the same revision, preventing inconsistent decisions even when concurrent writes occur.
Four consistency levels enable trade-offs — MINIMIZE_LATENCY (fastest), AT_LEAST_AS_FRESH (bounded staleness), AT_EXACT_SNAPSHOT (time-travel), FULLY_CONSISTENT (strongest) allow applications to choose the right balance between latency and consistency.
EXCLUSION requires snapshot isolation — Base and exclusion must be evaluated at the same revision to ensure correctness, preventing anomalies from concurrent writes.

Why it matters: In distributed systems, consistency is not free—there are fundamental trade-offs between latency, availability, and consistency (CAP theorem). Revision-based snapshots provide a principled way to reason about these trade-offs, allowing applications to choose the right consistency level for each use case while maintaining correctness guarantees. Combined with snapshot isolation, this ensures that authorization decisions are always based on a consistent view of the data, even in the face of concurrent writes and distributed replicas.

Next → Post 11B: Production Deployment

We now have revision-based snapshots and consistency levels, but there are important production considerations we haven't addressed:

Design decisions: Why logical revisions instead of timestamps? Why four consistency levels?
Edge cases: What happens when revisions are unavailable? How do we handle replica lag?
Operational concerns: Revision retention, garbage collection, monitoring
Performance: Caching strategies, index design, query optimization

Post 11B tackles these production deployment concerns, providing the operational knowledge needed to run a revision-based authorization system at scale.

Preview:

// Revision retention policy
type RetentionPolicy struct {
    MinRevisions int64         // Keep at least N revisions
    MaxAge       time.Duration  // Keep revisions for at least T time
}

// Garbage collection
func GarbageCollect(policy *RetentionPolicy) {
    oldestRevision := currentRevision - policy.MinRevisions
    cutoffTime := now - policy.MaxAge
    // Delete tuples older than both thresholds
}

Post 11A — Revision-Based Snapshots

The Problem Statement

Human Rule

Why Current Models Don't Work

What We Have So Far (Post 1-10 Recap)

Approach 1: Read Latest (Inconsistent)

Approach 2: Locks (Slow and Fragile)

The Solution: Revision-Based Snapshots

Part 1: Revision Model

Part 2: Snapshot Isolation

Part 3: Consistency Levels

Part 4: Revision Semantics for EXCLUSION

Real-World Example: Multi-Region Document Sharing

Initial State (Revision 100)

Write Operation (Revision 101)

Check with MINIMIZE_LATENCY

Check with AT_LEAST_AS_FRESH

Check with AT_EXACT_SNAPSHOT

Check with FULLY_CONSISTENT

Takeaways

Next → Post 11B: Production Deployment

Keep Reading

Sub-Microseconds

Home