The Problem Statement

In Post 11A, we established revision-based snapshots and four consistency levels. But deploying this to production requires addressing critical operational concerns:

  • Revision retention: How long do we keep old revisions?

  • Garbage collection: How do we clean up old data without breaking audits?

  • Replica lag: How do we monitor and handle slow replicas?

  • Performance: How do we optimize for high-throughput, low-latency checks?

  • Edge cases: What happens when revisions are unavailable or replicas fail?

Real-world operational challenges:

Looking at how production systems handle revisions reveals critical requirements:

  • Google Zanzibar: Garbage collection with retention policies

  • Auth0 FGA: Replica lag monitoring and alerting

  • Ory Keto: Configurable retention for compliance

  • SpiceDB: Performance optimization with caching

Common operational patterns:

  1. Retention policies: Keep revisions for audit compliance (HIPAA, SOC2, GDPR)

  2. Garbage collection: Delete old revisions to save storage

  3. Replica monitoring: Track lag and health of replicas

  4. Performance tuning: Optimize indexes, caching, and query patterns

What we need to address:

"Production deployment requires operational excellence: retention policies, garbage collection, monitoring, and performance optimization."

Our current model has gaps:

  • No retention policy — revisions accumulate forever

  • No garbage collection — storage grows unbounded

  • No replica monitoring — can't detect lag or failures

  • No performance optimization — queries may be slow

The core problem: We need operational tools and design decisions to run revision-based authorization at scale.

Human Rule

"Production systems need operational excellence, not just correctness."

Retention policies, garbage collection, monitoring, and performance optimization are essential for running authorization at scale.

Design Decisions

Decision 1: Logical Revisions vs Timestamps

Question: Should we use logical revisions (counters) or physical timestamps?

Answer: Logical revisions (counters) monotonic and precise.

Rationale:

Logical revisions (chosen):

type Revision struct {
    Counter int64  // Monotonically increasing counter
}

Benefits:

  • Monotonic: Always increasing, never goes backwards

  • Precise: No ambiguity about ordering

  • No clock skew: Independent of physical clocks

  • Deterministic: Same revision always means same state

Physical timestamps (rejected):

type Revision struct {
    Timestamp time.Time  // Physical clock time
}

Problems:

  • Clock skew: Distributed clocks are not synchronized

  • Non-monotonic: Clocks can go backwards (NTP adjustments)

  • Precision: Timestamp precision varies across systems

  • Ambiguity: Multiple writes at same timestamp

Example failure with timestamps:

// Node A clock: 10:00:00.000
// Node B clock: 10:00:00.100 (100ms ahead due to clock skew)

// Write on Node A at 10:00:00.050 (Node A time)
// → Timestamp: 10:00:00.050

// Read on Node B at 10:00:00.080 (Node B time)
// → Request snapshot at 10:00:00.080
// → Node A thinks 10:00:00.080 > 10:00:00.050 → include write
// → Node B thinks 10:00:00.080 < 10:00:00.100 → exclude write
// → INCONSISTENT! ❌

Decision 2: Four Consistency Levels vs Two

Question: Why four consistency levels instead of just "strong" and "eventual"?

Answer: Four levels granular control over latency/consistency trade-offs.

Rationale:

Four levels (chosen):

MINIMIZE_LATENCY: Read from any replica (fastest)
AT_LEAST_AS_FRESH: Read from replica with revision >= R (bounded staleness)
AT_EXACT_SNAPSHOT: Read from exact revision R (time-travel)
FULLY_CONSISTENT: Read from leader (strongest)

Benefits:

  • Granular control: Choose right trade-off for each use case

  • Read-after-write: AT_LEAST_AS_FRESH enables this pattern

  • Time-travel: AT_EXACT_SNAPSHOT enables audit compliance

  • Performance: MINIMIZE_LATENCY optimizes for latency

Two levels only (rejected):

STRONG: Read from leader
EVENTUAL: Read from any replica

Problems:

  • No read-after-write: Can't guarantee seeing own writes

  • No time-travel: Can't reproduce historical decisions

  • Coarse-grained: Can't fine-tune latency/consistency

  • All-or-nothing: Either slow (leader) or stale (replica)

Decision 3: Global Revision Counter vs Per-Namespace

Question: Should revisions be global or per-namespace?

Answer: Global revision counter simpler and more correct.

Rationale:

Global counter (chosen):

// Single global revision counter
var globalRevision int64

func WriteTuple(tuple *RelationTuple) Revision {
    revision := atomic.AddInt64(&globalRevision, 1)
    tuple.Revision = revision
    return revision
}

Benefits:

  • Total ordering: All writes have global order

  • Cross-namespace consistency: Can read multiple namespaces at same revision

  • Simpler: One counter to manage

  • EXCLUSION correctness: Base and exclusion always comparable

Per-namespace (rejected):

// Separate revision counter per namespace
var revisions map[string]int64

func WriteTuple(tuple *RelationTuple) Revision {
    revision := atomic.AddInt64(&revisions[tuple.Namespace], 1)
    tuple.Revision = revision
    return revision
}

Problems:

  • No cross-namespace ordering: Can't compare revisions across namespaces

  • EXCLUSION broken: Base and exclusion in different namespaces not comparable

  • Complex: Multiple counters to manage

  • Audit complexity: Can't reproduce global state at single revision

Edge Cases and Safety

Edge Case 1: Revision Not Available

Problem: What if requested revision has been garbage collected?

Solution: Return error can't guarantee correctness.

func CheckAtExactSnapshot(resource, subject, context, revision) (bool, error) {
    minAvailableRevision := storage.GetMinAvailableRevision()

    if revision < minAvailableRevision {
        return false, fmt.Errorf("revision %d not available (min: %d)", revision, minAvailableRevision)
    }

    return CheckAtRevision(resource, subject, context, revision), nil
}

Rationale:

  • Fail-safe: Can't guarantee correctness → return error

  • Explicit: Caller knows revision is unavailable

  • Detectable: Can log and alert on unavailable revisions

Mitigation: Configure retention policy to keep revisions for required audit period.

Edge Case 2: Replica Lag Too High

Problem: What if replica can't catch up to requested revision in reasonable time?

Solution: Timeout and return error, or fallback to leader.

func CheckAtLeastAsFresh(resource, subject, context, minRevision) (bool, error) {
    replica, err := findReplicaWithRevision(minRevision, timeout=5*time.Second)
    if err != nil {
        // Option 1: Return error
        return false, fmt.Errorf("replica lag too high: %w", err)

        // Option 2: Fallback to leader
        return CheckFullyConsistent(resource, subject, context), nil
    }

    revision := replica.GetCurrentRevision()
    return CheckAtRevision(resource, subject, context, revision), nil
}

Rationale:

  • Bounded latency: Don't wait forever

  • Graceful degradation: Fallback to leader if needed

  • Detectable: Can log and alert on high replica lag

Edge Case 3: Concurrent Writes During Check

Problem: What if writes happen during Check evaluation?

Solution: Snapshot isolation ensures writes don't interfere.

// Check starts at revision 100
Check(document:1#viewer, user:alice, ctx)
// → Choose snapshot: revision 100
// → All reads use revision 100

// Concurrent write at revision 101
WriteTuple(user:alice  document:1#banned)
// → Written at revision 101

// Check continues at revision 100
// → Read banned tuples at revision 100: (not present)
// → Write at revision 101 not visible
// → Check completes with consistent snapshot ✅

Rationale:

  • Isolation: Concurrent writes don't interfere

  • Consistency: All reads see same snapshot

  • Correctness: Result reflects state at chosen revision

Edge Case 4: Stale Reads with MINIMIZE_LATENCY

Problem: What if MINIMIZE_LATENCY reads very stale data?

Solution: This is expected behavior caller chose latency over freshness.

// Write at revision 100
WriteTuple(user:alice  document:1#viewer)

// Read from stale replica at revision 50 (50 revisions behind)
result := Check(document:1#viewer, user:alice, ctx)
// → Read from replica at revision 50
// → user:alice not found (write not yet replicated)
// → Access denied ❌

// This is EXPECTED with MINIMIZE_LATENCY!
// Caller chose latency over freshness.

Mitigation:

  • Use AT_LEAST_AS_FRESH for read-after-write consistency

  • Monitor replica lag and alert if too high

  • Configure replication to minimize lag

Edge Case 5: Leader Failover

Problem: What happens when leader fails during Check evaluation?

Solution: Continue at chosen revision snapshot isolation ensures correctness.

func Check(resource, subject, context) (bool, error) {
    // Step 1: Choose snapshot revision
    revision, err := storage.GetCurrentRevision()
    if err != nil {
        return false, fmt.Errorf("failed to get revision: %w", err)
    }

    // Step 2: Evaluate using snapshot (even if leader fails)
    result, err := CheckAtRevision(resource, subject, context, revision)
    if err != nil {
        // If reads fail (leader down, replica unavailable), retry
        return false, fmt.Errorf("check failed: %w", err)
    }

    return result, nil
}

func CheckAtRevision(resource, subject, context, revision) (bool, error) {
    // All reads use the same revision
    // If leader fails, reads can continue from replicas (at same revision)
    tuples, err := storage.ReadTuplesAtRevision(resource, subject, revision)
    if err != nil {
        // Replica unavailable or revision not available
        return false, err
    }

    // Continue evaluation...
    return evaluateTuples(tuples, revision), nil
}

Rationale:

  • Resilient: Can continue reading from replicas

  • Consistent: Snapshot isolation ensures correctness

  • Retryable: If all replicas fail, can retry

Operational Concerns

Garbage Collection and Retention

Problem: Old revisions consume storage. How to garbage collect while preserving audit trail?

Strategy: Retention policy based on time and revision count.

type GCPolicy struct {
    RetentionPeriod time.Duration  // Keep revisions for 90 days
    MinRevisions    int64          // Keep at least 1000 revisions
    MaxRevisions    int64          // Keep at most 1M revisions
}

type RevisionMetadata struct {
    Revision  Revision
    Timestamp time.Time
    TupleCount int64
}

func GarbageCollect(policy *GCPolicy) error {
    currentRevision := storage.GetCurrentRevision()

    // Step 1: Determine minimum revision to keep
    minRevisionByCount := currentRevision - policy.MinRevisions
    cutoffTime := time.Now().Add(-policy.RetentionPeriod)

    // Step 2: Find oldest revision to keep (max of both constraints)
    minRevision := minRevisionByCount

    revisions := storage.GetRevisionMetadata()
    for _, rev := range revisions {
        if rev.Timestamp.After(cutoffTime) {
            // This revision is within retention period
            if rev.Revision < minRevision {
                minRevision = rev.Revision
            }
            break
        }
    }

    // Step 3: Ensure we keep at least MinRevisions
    if currentRevision - minRevision < policy.MinRevisions {
        minRevision = currentRevision - policy.MinRevisions
    }

    // Step 4: Delete tuples with revision < minRevision
    log.Info("Garbage collecting revisions < %d", minRevision)
    deletedCount := storage.DeleteTuplesBeforeRevision(minRevision)
    log.Info("Deleted %d tuples", deletedCount)

    return nil
}

Compliance considerations:

// HIPAA: 6 years retention
hipaaPolicy := &GCPolicy{
    RetentionPeriod: 6 * 365 * 24 * time.Hour,
    MinRevisions:    1000,
}

// SOC2: 1 year retention
soc2Policy := &GCPolicy{
    RetentionPeriod: 365 * 24 * time.Hour,
    MinRevisions:    1000,
}

// GDPR: Right to be forgotten (delete immediately)
gdprPolicy := &GCPolicy{
    RetentionPeriod: 0,  // Delete immediately
    MinRevisions:    0,  // No minimum
}

Rationale:

  • Storage efficiency: Delete old revisions to save space

  • Compliance: Meet retention requirements (HIPAA, SOC2, GDPR)

  • Audit trail: Keep revisions for required period

  • Configurable: Different policies for different use cases

Replica Lag Monitoring

Problem: How to detect and alert on replica lag?

Solution: Monitor replica revision and compare to leader.

type ReplicaMonitor struct {
    replicas map[string]*Replica
    leader   *Replica
}

type ReplicaHealth struct {
    Name         string
    Revision     Revision
    Lag          int64  // Revisions behind leader
    LastUpdate   time.Time
    Healthy      bool
}

func (m *ReplicaMonitor) CheckHealth() map[string]*ReplicaHealth {
    leaderRevision := m.leader.GetCurrentRevision()
    health := make(map[string]*ReplicaHealth)

    for name, replica := range m.replicas {
        replicaRevision := replica.GetCurrentRevision()
        lag := leaderRevision - replicaRevision

        healthy := lag < 100  // Healthy if < 100 revisions behind

        health[name] = &ReplicaHealth{
            Name:       name,
            Revision:   replicaRevision,
            Lag:        lag,
            LastUpdate: time.Now(),
            Healthy:    healthy,
        }

        // Alert if unhealthy
        if !healthy {
            alert("Replica %s is unhealthy: %d revisions behind", name, lag)
        }
    }

    return health
}

Metrics to track:

// Prometheus metrics
var (
    replicaLag = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "replica_lag_revisions",
            Help: "Number of revisions replica is behind leader",
        },
        []string{"replica"},
    )

    replicaHealth = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "replica_healthy",
            Help: "Whether replica is healthy (1) or unhealthy (0)",
        },
        []string{"replica"},
    )
)

func (m *ReplicaMonitor) UpdateMetrics() {
    health := m.CheckHealth()

    for name, h := range health {
        replicaLag.WithLabelValues(name).Set(float64(h.Lag))

        if h.Healthy {
            replicaHealth.WithLabelValues(name).Set(1)
        } else {
            replicaHealth.WithLabelValues(name).Set(0)
        }
    }
}

Alerting rules:

# Prometheus alerting rules
groups:
  - name: replica_health
    rules:
      - alert: ReplicaLagHigh
        expr: replica_lag_revisions > 100
        for: 5m
        annotations:
          summary: "Replica {{ $labels.replica }} is lagging"
          description: "Replica is {{ $value }} revisions behind leader"

      - alert: ReplicaUnhealthy
        expr: replica_healthy == 0
        for: 5m
        annotations:
          summary: "Replica {{ $labels.replica }} is unhealthy"
          description: "Replica has been unhealthy for 5 minutes"

Rationale:

  • Visibility: Know which replicas are lagging

  • Alerting: Get notified when replicas fall behind

  • Debugging: Investigate replication issues

  • Capacity planning: Track replication throughput

Performance Optimization

1. Revision Index

Index tuples by revision for efficient snapshot reads:

// Index: (namespace, object_id, relation, subject_sig, revision) → tuple
// Allows efficient queries: "Find all tuples for resource at revision R"

type TupleIndex struct {
    // Primary index: (resource, subject) → [(revision, tuple)]
    primary map[string][]RevisionedTuple

    // Revision index: revision → [tuple]
    byRevision map[int64][]*RelationTuple
}

func (idx *TupleIndex) ReadAtRevision(resource, subject, revision) []*RelationTuple {
    key := fmt.Sprintf("%s:%s#%s:%s", resource.Namespace, resource.ObjectID, resource.Relation, subject.Signature())
    tuples := idx.primary[key]

    // Binary search for tuples with revision <= requested revision
    result := []*RelationTuple{}
    for _, rt := range tuples {
        if rt.Revision <= revision {
            result = append(result, rt.Tuple)
        }
    }

    return result
}

2. Revision Caching

Cache current revision to avoid repeated lookups:

type RevisionCache struct {
    current    Revision
    lastUpdate time.Time
    ttl        time.Duration
}

func (c *RevisionCache) GetCurrentRevision() Revision {
    if time.Since(c.lastUpdate) > c.ttl {
        c.current = storage.GetCurrentRevision()
        c.lastUpdate = time.Now()
    }
    return c.current
}

3. Replica Selection

Choose replica based on consistency level and latency:

func selectReplica(consistencyLevel ConsistencyRequirement, minRevision Revision) *Replica {
    switch consistencyLevel {
    case MINIMIZE_LATENCY:
        return selectClosestReplica()  // Lowest latency
    case AT_LEAST_AS_FRESH:
        return selectReplicaWithRevision(minRevision)  // Has required revision
    case FULLY_CONSISTENT:
        return selectLeader()  // Leader only
    }
}

Rationale:

  • Fast reads: Indexed lookups are O(log n)

  • Reduced load: Caching reduces database queries

  • Latency optimization: Choose closest replica

  • Scalability: Distribute reads across replicas

Model Extension

To support production deployment, we extend our model with:

1. Revision Metadata

type Revision struct {
    Counter   int64      // Monotonically increasing counter
    Timestamp time.Time  // When revision was created (for GC)
}

type RevisionMetadata struct {
    Revision   Revision
    TupleCount int64      // Number of tuples at this revision
    Size       int64      // Storage size in bytes
}

2. Garbage Collection Policy

type GCPolicy struct {
    RetentionPeriod time.Duration  // Keep revisions for N days
    MinRevisions    int64          // Keep at least N revisions
    MaxRevisions    int64          // Keep at most N revisions
}

type GarbageCollector interface {
    Collect(policy *GCPolicy) error
    GetMinAvailableRevision() Revision
}

Model Extension

To support production deployment, we extend our model with:

1. Revision Metadata

type Revision struct {
    Counter   int64      // Monotonically increasing counter
    Timestamp time.Time  // When revision was created (for GC)
}

type RevisionMetadata struct {
    Revision   Revision
    TupleCount int64      // Number of tuples at this revision
    Size       int64      // Storage size in bytes
}

2. Garbage Collection Policy

type GCPolicy struct {
    RetentionPeriod time.Duration  // Keep revisions for N days
    MinRevisions    int64          // Keep at least N revisions
    MaxRevisions    int64          // Keep at most N revisions
}

type GarbageCollector interface {
    Collect(policy *GCPolicy) error
    GetMinAvailableRevision() Revision
}

3. Replica Health Monitoring

type ReplicaHealth struct {
    Name       string
    Revision   Revision
    Lag        int64      // Revisions behind leader
    Healthy    bool       // Whether replica is healthy
    LastUpdate time.Time
}

type ReplicaMonitor interface {
    CheckHealth() map[string]*ReplicaHealth
    GetHealthyReplicas() []*Replica
}

Takeaways

  1. Logical revisions are superior to timestamps — Monotonic counters avoid clock skew, precision issues, and non-monotonicity, providing deterministic versioning across distributed systems.

  2. Four consistency levels enable granular trade-offs — MINIMIZE_LATENCY (fast), AT_LEAST_AS_FRESH (read-after-write), AT_EXACT_SNAPSHOT (time-travel), FULLY_CONSISTENT (strongest) let applications choose the right balance for each use case.

  3. Operational excellence is essential — Retention policies, garbage collection, replica monitoring, and performance optimization transform a theoretically correct system into a production-ready service.

Why it matters: Distributed authorization systems must handle millions of requests per second globally while maintaining consistency, compliance, and performance. Revision-based snapshots provide the foundation, but production deployment requires operational tools: garbage collection for storage efficiency and compliance (HIPAA, SOC2, GDPR), replica monitoring for visibility and alerting, and performance optimization for low-latency, high-throughput checks. Combined with edge case handling (unavailable revisions, replica lag, leader failover), these mechanisms ensure the system runs reliably at scale.

Real-World Context

Which Companies Face This Problem?

1. Distributed Authorization

  • Google Zanzibar: Snapshot reads with Zookie tokens, garbage collection with retention policies

  • Auth0 FGA: Consistency tokens for snapshot isolation, replica lag monitoring

  • Ory Keto: Revision-based reads, configurable retention for compliance

2. Audit Compliance

  • Stripe: Reproduce historical authorization decisions for audit trail

  • Square: Time-travel queries for compliance (HIPAA, SOC2)

  • Plaid: Snapshot isolation for regulatory requirements

3. Multi-Region Deployment

  • GitHub: Multi-region replicas with consistency levels, replica health monitoring

  • GitLab: Bounded staleness for read-after-write, garbage collection policies

  • Bitbucket: Regional replicas with eventual consistency, performance optimization

4. Time-Travel Queries

  • AWS IAM: Policy simulator with historical queries at specific revisions

  • Azure RBAC: Time-travel for debugging permission issues

  • GCP IAM: Historical permission checks for audit compliance

We now have a complete authorization model with determinism, fail-safe defaults, snapshot isolation, and operational excellence. But before we ship to production, we need to prove correctness.

Post 12 is the final post in the series: we'll consolidate all the concepts from Posts 1-11, define the complete model, enumerate all invariants, and prove that the system is correct by construction.

Preview:

// Core invariants:
// INV-1: Relation Identity (namespace, relation)
// INV-2: No Resource Wildcards
// INV-3: Schema Closure (all references exist)
// INV-4: Allowed Kinds Mask (ignore invalid tuples)
// INV-5: Single Snapshot Evaluation
// ... and 11 more

Keep Reading