The Problem Statement
In Post 11A, we established revision-based snapshots and four consistency levels. But deploying this to production requires addressing critical operational concerns:
Revision retention: How long do we keep old revisions?
Garbage collection: How do we clean up old data without breaking audits?
Replica lag: How do we monitor and handle slow replicas?
Performance: How do we optimize for high-throughput, low-latency checks?
Edge cases: What happens when revisions are unavailable or replicas fail?
Real-world operational challenges:
Looking at how production systems handle revisions reveals critical requirements:
Google Zanzibar: Garbage collection with retention policies
Auth0 FGA: Replica lag monitoring and alerting
Ory Keto: Configurable retention for compliance
SpiceDB: Performance optimization with caching
Common operational patterns:
Retention policies: Keep revisions for audit compliance (HIPAA, SOC2, GDPR)
Garbage collection: Delete old revisions to save storage
Replica monitoring: Track lag and health of replicas
Performance tuning: Optimize indexes, caching, and query patterns
What we need to address:
"Production deployment requires operational excellence: retention policies, garbage collection, monitoring, and performance optimization."
Our current model has gaps:
❌ No retention policy — revisions accumulate forever
❌ No garbage collection — storage grows unbounded
❌ No replica monitoring — can't detect lag or failures
❌ No performance optimization — queries may be slow
The core problem: We need operational tools and design decisions to run revision-based authorization at scale.
Human Rule
"Production systems need operational excellence, not just correctness."
Retention policies, garbage collection, monitoring, and performance optimization are essential for running authorization at scale.
Design Decisions
Decision 1: Logical Revisions vs Timestamps
Question: Should we use logical revisions (counters) or physical timestamps?
Answer: Logical revisions (counters) — monotonic and precise.
Rationale:
Logical revisions (chosen):
type Revision struct {
Counter int64 // Monotonically increasing counter
}Benefits:
✅ Monotonic: Always increasing, never goes backwards
✅ Precise: No ambiguity about ordering
✅ No clock skew: Independent of physical clocks
✅ Deterministic: Same revision always means same state
Physical timestamps (rejected):
type Revision struct {
Timestamp time.Time // Physical clock time
}Problems:
❌ Clock skew: Distributed clocks are not synchronized
❌ Non-monotonic: Clocks can go backwards (NTP adjustments)
❌ Precision: Timestamp precision varies across systems
❌ Ambiguity: Multiple writes at same timestamp
Example failure with timestamps:
// Node A clock: 10:00:00.000
// Node B clock: 10:00:00.100 (100ms ahead due to clock skew)
// Write on Node A at 10:00:00.050 (Node A time)
// → Timestamp: 10:00:00.050
// Read on Node B at 10:00:00.080 (Node B time)
// → Request snapshot at 10:00:00.080
// → Node A thinks 10:00:00.080 > 10:00:00.050 → include write
// → Node B thinks 10:00:00.080 < 10:00:00.100 → exclude write
// → INCONSISTENT! ❌Decision 2: Four Consistency Levels vs Two
Question: Why four consistency levels instead of just "strong" and "eventual"?
Answer: Four levels — granular control over latency/consistency trade-offs.
Rationale:
Four levels (chosen):
MINIMIZE_LATENCY: Read from any replica (fastest)
AT_LEAST_AS_FRESH: Read from replica with revision >= R (bounded staleness)
AT_EXACT_SNAPSHOT: Read from exact revision R (time-travel)
FULLY_CONSISTENT: Read from leader (strongest)Benefits:
✅ Granular control: Choose right trade-off for each use case
✅ Read-after-write: AT_LEAST_AS_FRESH enables this pattern
✅ Time-travel: AT_EXACT_SNAPSHOT enables audit compliance
✅ Performance: MINIMIZE_LATENCY optimizes for latency
Two levels only (rejected):
STRONG: Read from leader
EVENTUAL: Read from any replicaProblems:
❌ No read-after-write: Can't guarantee seeing own writes
❌ No time-travel: Can't reproduce historical decisions
❌ Coarse-grained: Can't fine-tune latency/consistency
❌ All-or-nothing: Either slow (leader) or stale (replica)
Decision 3: Global Revision Counter vs Per-Namespace
Question: Should revisions be global or per-namespace?
Answer: Global revision counter — simpler and more correct.
Rationale:
Global counter (chosen):
// Single global revision counter
var globalRevision int64
func WriteTuple(tuple *RelationTuple) Revision {
revision := atomic.AddInt64(&globalRevision, 1)
tuple.Revision = revision
return revision
}Benefits:
✅ Total ordering: All writes have global order
✅ Cross-namespace consistency: Can read multiple namespaces at same revision
✅ Simpler: One counter to manage
✅ EXCLUSION correctness: Base and exclusion always comparable
Per-namespace (rejected):
// Separate revision counter per namespace
var revisions map[string]int64
func WriteTuple(tuple *RelationTuple) Revision {
revision := atomic.AddInt64(&revisions[tuple.Namespace], 1)
tuple.Revision = revision
return revision
}Problems:
❌ No cross-namespace ordering: Can't compare revisions across namespaces
❌ EXCLUSION broken: Base and exclusion in different namespaces not comparable
❌ Complex: Multiple counters to manage
❌ Audit complexity: Can't reproduce global state at single revision
Edge Cases and Safety
Edge Case 1: Revision Not Available
Problem: What if requested revision has been garbage collected?
Solution: Return error — can't guarantee correctness.
func CheckAtExactSnapshot(resource, subject, context, revision) (bool, error) {
minAvailableRevision := storage.GetMinAvailableRevision()
if revision < minAvailableRevision {
return false, fmt.Errorf("revision %d not available (min: %d)", revision, minAvailableRevision)
}
return CheckAtRevision(resource, subject, context, revision), nil
}
Rationale:
✅ Fail-safe: Can't guarantee correctness → return error
✅ Explicit: Caller knows revision is unavailable
✅ Detectable: Can log and alert on unavailable revisions
Mitigation: Configure retention policy to keep revisions for required audit period.’
Edge Case 2: Replica Lag Too High
Problem: What if replica can't catch up to requested revision in reasonable time?
Solution: Timeout and return error, or fallback to leader.
func CheckAtLeastAsFresh(resource, subject, context, minRevision) (bool, error) {
replica, err := findReplicaWithRevision(minRevision, timeout=5*time.Second)
if err != nil {
// Option 1: Return error
return false, fmt.Errorf("replica lag too high: %w", err)
// Option 2: Fallback to leader
return CheckFullyConsistent(resource, subject, context), nil
}
revision := replica.GetCurrentRevision()
return CheckAtRevision(resource, subject, context, revision), nil
}Rationale:
✅ Bounded latency: Don't wait forever
✅ Graceful degradation: Fallback to leader if needed
✅ Detectable: Can log and alert on high replica lag
Edge Case 3: Concurrent Writes During Check
Problem: What if writes happen during Check evaluation?
Solution: Snapshot isolation ensures writes don't interfere.
// Check starts at revision 100
Check(document:1#viewer, user:alice, ctx)
// → Choose snapshot: revision 100
// → All reads use revision 100
// Concurrent write at revision 101
WriteTuple(user:alice → document:1#banned)
// → Written at revision 101
// Check continues at revision 100
// → Read banned tuples at revision 100: (not present)
// → Write at revision 101 not visible
// → Check completes with consistent snapshot ✅Rationale:
✅ Isolation: Concurrent writes don't interfere
✅ Consistency: All reads see same snapshot
✅ Correctness: Result reflects state at chosen revision
Edge Case 4: Stale Reads with MINIMIZE_LATENCY
Problem: What if MINIMIZE_LATENCY reads very stale data?
Solution: This is expected behavior — caller chose latency over freshness.
// Write at revision 100
WriteTuple(user:alice → document:1#viewer)
// Read from stale replica at revision 50 (50 revisions behind)
result := Check(document:1#viewer, user:alice, ctx)
// → Read from replica at revision 50
// → user:alice not found (write not yet replicated)
// → Access denied ❌
// This is EXPECTED with MINIMIZE_LATENCY!
// Caller chose latency over freshness.Mitigation:
Use AT_LEAST_AS_FRESH for read-after-write consistency
Monitor replica lag and alert if too high
Configure replication to minimize lag
Edge Case 5: Leader Failover
Problem: What happens when leader fails during Check evaluation?
Solution: Continue at chosen revision — snapshot isolation ensures correctness.
func Check(resource, subject, context) (bool, error) {
// Step 1: Choose snapshot revision
revision, err := storage.GetCurrentRevision()
if err != nil {
return false, fmt.Errorf("failed to get revision: %w", err)
}
// Step 2: Evaluate using snapshot (even if leader fails)
result, err := CheckAtRevision(resource, subject, context, revision)
if err != nil {
// If reads fail (leader down, replica unavailable), retry
return false, fmt.Errorf("check failed: %w", err)
}
return result, nil
}
func CheckAtRevision(resource, subject, context, revision) (bool, error) {
// All reads use the same revision
// If leader fails, reads can continue from replicas (at same revision)
tuples, err := storage.ReadTuplesAtRevision(resource, subject, revision)
if err != nil {
// Replica unavailable or revision not available
return false, err
}
// Continue evaluation...
return evaluateTuples(tuples, revision), nil
}Rationale:
✅ Resilient: Can continue reading from replicas
✅ Consistent: Snapshot isolation ensures correctness
✅ Retryable: If all replicas fail, can retry
Operational Concerns
Garbage Collection and Retention
Problem: Old revisions consume storage. How to garbage collect while preserving audit trail?
Strategy: Retention policy based on time and revision count.
type GCPolicy struct {
RetentionPeriod time.Duration // Keep revisions for 90 days
MinRevisions int64 // Keep at least 1000 revisions
MaxRevisions int64 // Keep at most 1M revisions
}
type RevisionMetadata struct {
Revision Revision
Timestamp time.Time
TupleCount int64
}
func GarbageCollect(policy *GCPolicy) error {
currentRevision := storage.GetCurrentRevision()
// Step 1: Determine minimum revision to keep
minRevisionByCount := currentRevision - policy.MinRevisions
cutoffTime := time.Now().Add(-policy.RetentionPeriod)
// Step 2: Find oldest revision to keep (max of both constraints)
minRevision := minRevisionByCount
revisions := storage.GetRevisionMetadata()
for _, rev := range revisions {
if rev.Timestamp.After(cutoffTime) {
// This revision is within retention period
if rev.Revision < minRevision {
minRevision = rev.Revision
}
break
}
}
// Step 3: Ensure we keep at least MinRevisions
if currentRevision - minRevision < policy.MinRevisions {
minRevision = currentRevision - policy.MinRevisions
}
// Step 4: Delete tuples with revision < minRevision
log.Info("Garbage collecting revisions < %d", minRevision)
deletedCount := storage.DeleteTuplesBeforeRevision(minRevision)
log.Info("Deleted %d tuples", deletedCount)
return nil
}Compliance considerations:
// HIPAA: 6 years retention
hipaaPolicy := &GCPolicy{
RetentionPeriod: 6 * 365 * 24 * time.Hour,
MinRevisions: 1000,
}
// SOC2: 1 year retention
soc2Policy := &GCPolicy{
RetentionPeriod: 365 * 24 * time.Hour,
MinRevisions: 1000,
}
// GDPR: Right to be forgotten (delete immediately)
gdprPolicy := &GCPolicy{
RetentionPeriod: 0, // Delete immediately
MinRevisions: 0, // No minimum
}Rationale:
✅ Storage efficiency: Delete old revisions to save space
✅ Compliance: Meet retention requirements (HIPAA, SOC2, GDPR)
✅ Audit trail: Keep revisions for required period
✅ Configurable: Different policies for different use cases
Replica Lag Monitoring
Problem: How to detect and alert on replica lag?
Solution: Monitor replica revision and compare to leader.
type ReplicaMonitor struct {
replicas map[string]*Replica
leader *Replica
}
type ReplicaHealth struct {
Name string
Revision Revision
Lag int64 // Revisions behind leader
LastUpdate time.Time
Healthy bool
}
func (m *ReplicaMonitor) CheckHealth() map[string]*ReplicaHealth {
leaderRevision := m.leader.GetCurrentRevision()
health := make(map[string]*ReplicaHealth)
for name, replica := range m.replicas {
replicaRevision := replica.GetCurrentRevision()
lag := leaderRevision - replicaRevision
healthy := lag < 100 // Healthy if < 100 revisions behind
health[name] = &ReplicaHealth{
Name: name,
Revision: replicaRevision,
Lag: lag,
LastUpdate: time.Now(),
Healthy: healthy,
}
// Alert if unhealthy
if !healthy {
alert("Replica %s is unhealthy: %d revisions behind", name, lag)
}
}
return health
}Metrics to track:
// Prometheus metrics
var (
replicaLag = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "replica_lag_revisions",
Help: "Number of revisions replica is behind leader",
},
[]string{"replica"},
)
replicaHealth = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "replica_healthy",
Help: "Whether replica is healthy (1) or unhealthy (0)",
},
[]string{"replica"},
)
)
func (m *ReplicaMonitor) UpdateMetrics() {
health := m.CheckHealth()
for name, h := range health {
replicaLag.WithLabelValues(name).Set(float64(h.Lag))
if h.Healthy {
replicaHealth.WithLabelValues(name).Set(1)
} else {
replicaHealth.WithLabelValues(name).Set(0)
}
}
}Alerting rules:
# Prometheus alerting rules
groups:
- name: replica_health
rules:
- alert: ReplicaLagHigh
expr: replica_lag_revisions > 100
for: 5m
annotations:
summary: "Replica {{ $labels.replica }} is lagging"
description: "Replica is {{ $value }} revisions behind leader"
- alert: ReplicaUnhealthy
expr: replica_healthy == 0
for: 5m
annotations:
summary: "Replica {{ $labels.replica }} is unhealthy"
description: "Replica has been unhealthy for 5 minutes"Rationale:
✅ Visibility: Know which replicas are lagging
✅ Alerting: Get notified when replicas fall behind
✅ Debugging: Investigate replication issues
✅ Capacity planning: Track replication throughput
Performance Optimization
1. Revision Index
Index tuples by revision for efficient snapshot reads:
// Index: (namespace, object_id, relation, subject_sig, revision) → tuple
// Allows efficient queries: "Find all tuples for resource at revision R"
type TupleIndex struct {
// Primary index: (resource, subject) → [(revision, tuple)]
primary map[string][]RevisionedTuple
// Revision index: revision → [tuple]
byRevision map[int64][]*RelationTuple
}
func (idx *TupleIndex) ReadAtRevision(resource, subject, revision) []*RelationTuple {
key := fmt.Sprintf("%s:%s#%s:%s", resource.Namespace, resource.ObjectID, resource.Relation, subject.Signature())
tuples := idx.primary[key]
// Binary search for tuples with revision <= requested revision
result := []*RelationTuple{}
for _, rt := range tuples {
if rt.Revision <= revision {
result = append(result, rt.Tuple)
}
}
return result
}2. Revision Caching
Cache current revision to avoid repeated lookups:
type RevisionCache struct {
current Revision
lastUpdate time.Time
ttl time.Duration
}
func (c *RevisionCache) GetCurrentRevision() Revision {
if time.Since(c.lastUpdate) > c.ttl {
c.current = storage.GetCurrentRevision()
c.lastUpdate = time.Now()
}
return c.current
}3. Replica Selection
Choose replica based on consistency level and latency:
func selectReplica(consistencyLevel ConsistencyRequirement, minRevision Revision) *Replica {
switch consistencyLevel {
case MINIMIZE_LATENCY:
return selectClosestReplica() // Lowest latency
case AT_LEAST_AS_FRESH:
return selectReplicaWithRevision(minRevision) // Has required revision
case FULLY_CONSISTENT:
return selectLeader() // Leader only
}
}Rationale:
✅ Fast reads: Indexed lookups are O(log n)
✅ Reduced load: Caching reduces database queries
✅ Latency optimization: Choose closest replica
✅ Scalability: Distribute reads across replicas
Model Extension
To support production deployment, we extend our model with:
1. Revision Metadata
type Revision struct {
Counter int64 // Monotonically increasing counter
Timestamp time.Time // When revision was created (for GC)
}
type RevisionMetadata struct {
Revision Revision
TupleCount int64 // Number of tuples at this revision
Size int64 // Storage size in bytes
}2. Garbage Collection Policy
type GCPolicy struct {
RetentionPeriod time.Duration // Keep revisions for N days
MinRevisions int64 // Keep at least N revisions
MaxRevisions int64 // Keep at most N revisions
}
type GarbageCollector interface {
Collect(policy *GCPolicy) error
GetMinAvailableRevision() Revision
}Model Extension
To support production deployment, we extend our model with:
1. Revision Metadata
type Revision struct {
Counter int64 // Monotonically increasing counter
Timestamp time.Time // When revision was created (for GC)
}
type RevisionMetadata struct {
Revision Revision
TupleCount int64 // Number of tuples at this revision
Size int64 // Storage size in bytes
}2. Garbage Collection Policy
type GCPolicy struct {
RetentionPeriod time.Duration // Keep revisions for N days
MinRevisions int64 // Keep at least N revisions
MaxRevisions int64 // Keep at most N revisions
}
type GarbageCollector interface {
Collect(policy *GCPolicy) error
GetMinAvailableRevision() Revision
}3. Replica Health Monitoring
type ReplicaHealth struct {
Name string
Revision Revision
Lag int64 // Revisions behind leader
Healthy bool // Whether replica is healthy
LastUpdate time.Time
}
type ReplicaMonitor interface {
CheckHealth() map[string]*ReplicaHealth
GetHealthyReplicas() []*Replica
}Takeaways
Logical revisions are superior to timestamps — Monotonic counters avoid clock skew, precision issues, and non-monotonicity, providing deterministic versioning across distributed systems.
Four consistency levels enable granular trade-offs — MINIMIZE_LATENCY (fast), AT_LEAST_AS_FRESH (read-after-write), AT_EXACT_SNAPSHOT (time-travel), FULLY_CONSISTENT (strongest) let applications choose the right balance for each use case.
Operational excellence is essential — Retention policies, garbage collection, replica monitoring, and performance optimization transform a theoretically correct system into a production-ready service.
Why it matters: Distributed authorization systems must handle millions of requests per second globally while maintaining consistency, compliance, and performance. Revision-based snapshots provide the foundation, but production deployment requires operational tools: garbage collection for storage efficiency and compliance (HIPAA, SOC2, GDPR), replica monitoring for visibility and alerting, and performance optimization for low-latency, high-throughput checks. Combined with edge case handling (unavailable revisions, replica lag, leader failover), these mechanisms ensure the system runs reliably at scale.
Real-World Context
Which Companies Face This Problem?
1. Distributed Authorization
Google Zanzibar: Snapshot reads with Zookie tokens, garbage collection with retention policies
Auth0 FGA: Consistency tokens for snapshot isolation, replica lag monitoring
Ory Keto: Revision-based reads, configurable retention for compliance
2. Audit Compliance
Stripe: Reproduce historical authorization decisions for audit trail
Square: Time-travel queries for compliance (HIPAA, SOC2)
Plaid: Snapshot isolation for regulatory requirements
3. Multi-Region Deployment
GitHub: Multi-region replicas with consistency levels, replica health monitoring
GitLab: Bounded staleness for read-after-write, garbage collection policies
Bitbucket: Regional replicas with eventual consistency, performance optimization
4. Time-Travel Queries
AWS IAM: Policy simulator with historical queries at specific revisions
Azure RBAC: Time-travel for debugging permission issues
GCP IAM: Historical permission checks for audit compliance
Next → Post 12: Final Model and Invariants
We now have a complete authorization model with determinism, fail-safe defaults, snapshot isolation, and operational excellence. But before we ship to production, we need to prove correctness.
Post 12 is the final post in the series: we'll consolidate all the concepts from Posts 1-11, define the complete model, enumerate all invariants, and prove that the system is correct by construction.
Preview:
// Core invariants:
// INV-1: Relation Identity (namespace, relation)
// INV-2: No Resource Wildcards
// INV-3: Schema Closure (all references exist)
// INV-4: Allowed Kinds Mask (ignore invalid tuples)
// INV-5: Single Snapshot Evaluation
// ... and 11 more
