The Problem Statement

In Post 10A, we established fail-safe defaults for unknown caveats and duplicate tuples. But production systems face another critical threat: denial-of-service (DoS) attacks through unbounded evaluation.

Without budget limits and cycle detection, we risk:

  • Denial of Service: Deep graphs consume unbounded resources

  • Stack overflow: Deep recursion crashes the service

  • Memory exhaustion: Visiting too many nodes consumes all memory

  • Database overload: Reading too many tuples overwhelms storage

  • Infinite loops: Cycles cause hangs

Real-world examples from production systems:

Looking at how production authorization systems handle resource limits reveals critical requirements:

  • Google Zanzibar: Budget limits prevent DoS attacks on deep hierarchies

  • Auth0 FGA: Cycle detection prevents infinite loops

  • Ory Keto: Evaluation depth limits protect against resource exhaustion

  • SpiceDB: Configurable limits per namespace

Common patterns that require budget limits:

  1. DoS attacks: Malicious graphs designed to exhaust resources

  2. Cycles: Misconfigured schemas create infinite loops

  3. Deep hierarchies: Legitimate but deep org structures

  4. Fan-out explosions: EDGE operations with thousands of targets

What we can't handle today:

"When evaluation encounters deep graphs, cycles, or resource limits, it must fail safely without compromising availability."

Our current model has gaps:

  • Deep graphs → unbounded evaluation

  • Cycles → infinite loops

  • No resource limits → DoS vulnerable

The core problem: We need budget limits that bound evaluation cost and cycle detection that prevents infinite loops.

Human Rule

"Evaluation must have bounded cost."

Depth, nodes, and tuples must all be bounded to prevent resource exhaustion, ensuring availability even when facing malicious graphs or misconfigured schemas.

Why Current Models Don't Work

Let's try to handle resource limits using our existing tools from Posts 1-9.

What We Have So Far (Post 1-9 Recap)

From previous posts, we have:

From Post 4: Graph traversal:

func Expand(resource, relation) {
    // What if graph has cycles? What if graph is too deep?
    // Current: infinite loop or stack overflow ❌
}

What's missing: No budget limits, no cycle detection

Approach 1: No Budget Limits (DoS Vulnerable)

// Attempt: Evaluate without limits

func Expand(resource, relation) {
    // No depth limit, no node limit, no cycle detection
    for child := range GetChildren(resource, relation) {
        Expand(child, relation)  // ← Unbounded recursion!
    }
}

Problems:

  • DoS attacks: Malicious graphs exhaust resources

  • Stack overflow: Deep graphs crash the service

  • Infinite loops: Cycles cause hangs

  • Resource exhaustion: Memory and CPU consumed

Example failure:

// Attacker creates deep hierarchy:
// folder:1 → folder:2 → folder:3 → ... → folder:10000

Expand(folder:1#viewer, user:alice)
// → Recursively expands 10,000 levels
// → Stack overflow or timeout
// → Service crashes! ❌

Approach 2: Fixed Limits (Inflexible)

// Attempt: Hard-coded limits

const MAX_DEPTH = 10  // ← Too restrictive!

func Expand(resource, relation, depth) {
    if depth > MAX_DEPTH {
        panic("depth limit exceeded")  // ← Crashes!
    }
    
    for child := range GetChildren(resource, relation) {
        Expand(child, relation, depth+1)
    }
}

Problems:

  • Too restrictive: Legitimate hierarchies exceed limit

  • Crashes: Panic instead of graceful degradation

  • Inflexible: Can't tune per namespace

  • No cycle detection: Still vulnerable to cycles

Example failure:

// Legitimate 15-level org hierarchy
Expand(folder:1#viewer, user:alice)
// → Depth 10 → panic!
// → Service crashes! ❌

Conclusion: We need configurable budget limits with graceful degradation and cycle detection.

The Solution: Budget Limits and Cycle Detection

We define comprehensive rules for bounding evaluation cost.

Part 1: Budget Limits (DoS Protection)

Rule: Evaluation must respect budget limits for depth, nodes, and tuples.

type EvaluationState struct {
    Depth        int32  // Current recursion depth
    NodesVisited int32  // Total nodes visited
    TuplesRead   int32  // Total tuples read

    MaxDepth  int32  // Maximum allowed depth
    MaxNodes  int32  // Maximum allowed nodes
    MaxTuples int32  // Maximum allowed tuples

    HitLimit bool  // Whether any limit was hit
}

func (s *EvaluationState) CheckLimits() bool {
    if s.MaxDepth > 0 && s.Depth > s.MaxDepth {
        s.HitLimit = true
        return true
    }
    if s.MaxNodes > 0 && s.NodesVisited > s.MaxNodes {
        s.HitLimit = true
        return true
    }
    if s.MaxTuples > 0 && s.TuplesRead > s.MaxTuples {
        s.HitLimit = true
        return true
    }
    return false
}

Default limits:

const (
    DefaultMaxDepth  = 50   // Maximum recursion depth
    DefaultMaxNodes  = 1000 // Maximum nodes visited
    DefaultMaxTuples = 5000 // Maximum tuples read
)

Evaluation with budget:

func Check(resource, subject, context) bool {
    state := &EvaluationState{
        MaxDepth:  DefaultMaxDepth,
        MaxNodes:  DefaultMaxNodes,
        MaxTuples: DefaultMaxTuples,
    }

    result := CheckWithState(resource, subject, context, state)

    if state.HitLimit {
        // Budget exceeded → deny access (fail-safe)
        return FALSE
    }

    return result
}

Rationale:

  • DoS protection: Prevents resource exhaustion

  • Bounded cost: Evaluation has finite cost

  • Fail-safe: Budget exceeded → deny access

  • Configurable: Limits can be tuned per deployment

Part 2: Cycle Detection → FALSE (Safety)

Rule: If a cycle is detected during graph traversal, treat it as FALSE.

type VisitKey struct {
    Namespace  string
    ObjectID   string
    Relation   string
    SubjectSig string
}

type EvaluationState struct {
    Visited map[VisitKey]bool
    // ... other fields
}

func CheckWithState(resource, subject, context, state) bool {
    // Create visit key
    key := VisitKey{
        Namespace:  resource.Namespace,
        ObjectID:   resource.ObjectID,
        Relation:   resource.Relation,
        SubjectSig: subject.Signature(),
    }

    // Check for cycle
    if state.Visited[key] {
        // Cycle detected → deny access (fail-safe)
        return FALSE
    }

    // Mark as visited
    state.Visited[key] = true
    defer delete(state.Visited, key)  // Unmark after evaluation

    // ... continue evaluation
}

Rationale:

  • Safety: Prevents infinite loops

  • Fail-safe: Cycle → deny access

  • Detectable: Can log cycles for debugging

  • Schema validation: Cycles should be caught at compile time

Example:

// Misconfigured schema creates cycle:
// document:1#viewer → folder:1#viewer → document:1#viewer (cycle!)

Check(document:1#viewer, user:alice, ctx)
// → Expand document:1#viewer
// → Expand folder:1#viewer
// → Expand document:1#viewer (already visited!)
// → Cycle detected → return FALSE
// → Access denied ✅

Part 3: Configurable Limits per Namespace

Rule: Different namespaces can have different budget limits.

type NamespaceDefinition struct {
    Name          string
    Relations     map[string]*NamespaceRelationDefinition
    BudgetLimits  *BudgetLimits  // ← Override default limits
}

type BudgetLimits struct {
    MaxDepth  int32  // Override default MaxDepth
    MaxNodes  int32  // Override default MaxNodes
    MaxTuples int32  // Override default MaxTuples
}

func Check(resource, subject, context) bool {
    // Get namespace-specific limits
    namespace := schema.GetNamespace(resource.Namespace)
    limits := namespace.BudgetLimits
    if limits == nil {
        // Use defaults
        limits = &BudgetLimits{
            MaxDepth:  DefaultMaxDepth,
            MaxNodes:  DefaultMaxNodes,
            MaxTuples: DefaultMaxTuples,
        }
    }

    state := &EvaluationState{
        MaxDepth:  limits.MaxDepth,
        MaxNodes:  limits.MaxNodes,
        MaxTuples: limits.MaxTuples,
    }

    return CheckWithState(resource, subject, context, state)
}

Example configuration:

// Folder namespace with deep hierarchies
folderNS := &NamespaceDefinition{
    Name: "folder",
    BudgetLimits: &BudgetLimits{
        MaxDepth:  100,  // Allow deeper hierarchies
        MaxNodes:  2000, // Allow more nodes
        MaxTuples: 10000, // Allow more tuples
    },
}

// Document namespace with default limits
documentNS := &NamespaceDefinition{
    Name: "document",
    BudgetLimits: nil,  // Use defaults (50, 1000, 5000)
}

Rationale:

  • Flexible: Different namespaces have different needs

  • Safe: Still bounded (no unlimited evaluation)

  • Configurable: Can tune per deployment

  • Fail-safe: If limits not set, use safe defaults

Real-World Example: DoS Attack Scenario

Let's walk through a realistic DoS attack scenario and how budget limits protect the system.

DoS Attack Attempt

// Attacker creates deep hierarchy
for i := 0; i < 10000; i++ {
    WriteTuple(&RelationTuple{
        Resource: &ObjectAndRelation{Namespace: "folder", ObjectID: fmt.Sprintf("%d", i), Relation: "viewer"},
        Subject:  &SubjectSet{Object: &ObjectRef{Namespace: "folder", ObjectID: fmt.Sprintf("%d", i+1)}, Relation: "viewer"},
    })
}

// Attacker queries top-level folder
Check(folder:0#viewer, user:attacker, ctx)
// → Start expanding folder:0#viewer
// → Depth: 1, Nodes: 1, Tuples: 1
// → Expand folder:1#viewer
// → Depth: 2, Nodes: 2, Tuples: 2
// → ... continue expanding
// → Depth: 50, Nodes: 50, Tuples: 50
// → Depth limit hit! (MaxDepth = 50)
// → Return FALSE (fail-safe)
// → Access denied ✅ (DoS attack prevented)

What happened:

  1. Attacker created 10,000-level deep hierarchy

  2. Evaluation started expanding the graph

  3. At depth 50, budget limit was hit

  4. Evaluation stopped immediately (fail-safe)

  5. Access denied (no resource exhaustion)

Monitoring:

// Log budget exceeded event
log.Warn("Budget exceeded: depth=%d, nodes=%d, tuples=%d",
    state.Depth, state.NodesVisited, state.TuplesRead)

// Increment metrics
metrics.BudgetExceededCount.Inc()
metrics.BudgetExceededDepth.Observe(state.Depth)

Edge Cases and Safety

Edge Case 1: Budget Exceeded Mid-Evaluation

Problem: What if budget is exceeded while evaluating multiple paths?

Solution: Return FALSE for the entire check (fail-safe).

func Check(resource, subject, context) bool {
    state := &EvaluationState{MaxDepth: 50, MaxNodes: 1000, MaxTuples: 5000}

    results := []EvaluationResult{}
    for path := range FindPaths(resource, subject) {
        result := EvaluatePath(path, context, state)

        if state.HitLimit {
            // Budget exceeded → deny access (fail-safe)
            return FALSE
        }

        results = append(results, result)
    }

    return SelectBestResult(results)
}

Rationale:

  • Fail-safe: Budget exceeded → deny access

  • Consistent: Don't return partial results

  • Detectable: Can log budget exceeded events

Edge Case 2: Cycle in Schema (Not Data)

Problem: What if schema itself has cycles (e.g., viewer = computed(editor), editor = computed(viewer))?

Solution: This is a schema validation error catch at compile time.

func ValidateSchema(schema *NamespaceDefinition) error {
    visited := make(map[string]bool)
    stack := make(map[string]bool)

    for relationName := range schema.Relations {
        if err := detectCycle(relationName, visited, stack, schema); err != nil {
            return err
        }
    }

    return nil
}

func detectCycle(relation string, visited, stack map[string]bool, schema *NamespaceDefinition) error {
    if stack[relation] {
        return fmt.Errorf("cycle detected: %s", relation)
    }

    if visited[relation] {
        return nil
    }

    visited[relation] = true
    stack[relation] = true

    // Check all referenced relations
    for _, ref := range getReferencedRelations(relation, schema) {
        if err := detectCycle(ref, visited, stack, schema); err != nil {
            return err
        }
    }

    stack[relation] = false
    return nil
}

Rationale:

  • Fail-fast: Catch cycles at schema compilation time

  • Clear error: Tell user exactly where cycle is

  • Prevent deployment: Don't deploy schemas with cycles

Edge Case 3: Budget Limits Too Low

Problem: What if legitimate hierarchy exceeds default budget limits?

Example:

// Large organization with 60-level hierarchy
// folder:1 → folder:2 → ... → folder:60

// Default MaxDepth = 50
Check(folder:1#viewer, user:alice, ctx)
// → Depth limit hit at level 50
// → Return FALSE (fail-safe)
// → Legitimate access denied! ❌

Solution: Use namespace-specific limits (see Part 3 above).

// Configure folder namespace with higher limits
folderNS := &NamespaceDefinition{
    Name: "folder",
    BudgetLimits: &BudgetLimits{
        MaxDepth:  100,  // Allow deeper hierarchies
        MaxNodes:  2000,
        MaxTuples: 10000,
    },
}

Check(folder:1#viewer, user:alice, ctx)
// → Use namespace-specific limits (MaxDepth = 100)
// → Evaluation succeeds at depth 60
// → Access granted ✅

Rationale:

  • Flexible: Different namespaces have different needs

  • Safe: Still bounded (no unlimited evaluation)

  • Tunable: Can adjust based on monitoring data

Edge Case 4: Duplicate Tuples with Same Caveat

Problem: What if truly identical tuples exist (same resource, subject, relation, caveat)?

Solution: Deduplicate before evaluation identical tuples are evaluated once.

func FindValidTuples(resource, subject, schema) []*RelationTuple {
    allTuples := storage.FindTuples(resource, subject)

    // Deduplicate by signature
    seen := make(map[string]bool)
    uniqueTuples := []*RelationTuple{}
    duplicateCount := 0

    for _, tuple := range allTuples {
        sig := tuple.Signature()  // Includes resource, subject, caveat

        if seen[sig] {
            // Duplicate detected
            duplicateCount++
            log.Warn("Duplicate tuple detected: %s", sig)
            continue
        }

        seen[sig] = true

        if IsValidTuple(tuple, schema) {
            uniqueTuples = append(uniqueTuples, tuple)
        }
    }

    if duplicateCount > 0 {
        metrics.DuplicateTupleCount.Add(duplicateCount)
    }

    return uniqueTuples
}

Rationale:

  • Efficient: Don't evaluate same tuple multiple times

  • Correct: Deduplication doesn't change semantics

  • Detectable: Log duplicates for data quality monitoring

Design Decisions

Decision 1: Budget Limits vs Unlimited Evaluation

Question: Should evaluation have budget limits or be unlimited?

Answer: Budget limits DoS protection is essential.

Rationale:

Budget limits (chosen):

MaxDepth = 50, MaxNodes = 1000, MaxTuples = 5000

Benefits:

  • DoS protection: Prevents resource exhaustion

  • Bounded cost: Evaluation has finite cost

  • Predictable: Can estimate worst-case cost

  • Configurable: Limits can be tuned per deployment

Unlimited (rejected):

No limits on depth, nodes, or tuples

Problems:

  • DoS vulnerable: Malicious graphs exhaust resources

  • Unpredictable cost: Can't estimate evaluation cost

  • Stack overflow: Deep graphs crash the service

  • Production risk: Can't safely deploy

Default limits rationale:

  • MaxDepth = 50: Typical org hierarchies are < 10 levels; 50 is generous

  • MaxNodes = 1000: Typical checks visit < 100 nodes; 1000 is generous

  • MaxTuples = 5000: Typical checks read < 500 tuples; 5000 is generous

Decision 2: Cycle Detection vs Schema Validation Only

Question: Should we detect cycles at runtime or only at schema validation time?

Answer: Both schema validation catches most, runtime detection is fail-safe.

Rationale:

Both (chosen):

Schema validation: Catch cycles in schema definitions
Runtime detection: Catch cycles in data (fail-safe)

Benefits:

  • Fail-fast: Schema validation prevents deployment

  • Fail-safe: Runtime detection handles data cycles

  • Defense in depth: Multiple layers of protection

  • Detectable: Can log runtime cycles for debugging

Schema validation only (rejected):

Only validate schema, assume data is correct

Problems:

  • Data cycles: Can't catch cycles created by tuples

  • No fail-safe: Runtime cycles cause infinite loops

  • Fragile: Assumes perfect data quality

Runtime detection only (rejected):

Only detect cycles at runtime, allow cyclic schemas

Problems:

  • Late detection: Cycles discovered during evaluation

  • Performance: Runtime detection has overhead

  • Poor UX: Should catch schema errors early

Decision 3: Global vs Namespace-Specific Limits

Question: Should budget limits be global or configurable per namespace?

Answer: Namespace-specific with global defaults flexibility with safety.

Rationale:

Namespace-specific (chosen):

Global defaults: MaxDepth=50, MaxNodes=1000, MaxTuples=5000
Namespace overrides: folder.MaxDepth=100

Benefits:

  • Flexible: Different namespaces have different needs

  • Safe: Still bounded (no unlimited evaluation)

  • Tunable: Can adjust based on monitoring

  • Fail-safe: Defaults ensure safety

Global only (rejected):

All namespaces use same limits

Problems:

  • Inflexible: Can't handle deep hierarchies

  • One-size-fits-all: Doesn't match real-world needs

  • Breaks legitimate use cases: Deep org structures fail

Model Extension

To support budget limits and cycle detection, we extend our model with:

1. EvaluationState with Budget Tracking

type EvaluationState struct {
    // Current state
    Depth        int32
    NodesVisited int32
    TuplesRead   int32

    // Budget limits
    MaxDepth  int32
    MaxNodes  int32
    MaxTuples int32

    // Cycle detection
    Visited map[VisitKey]bool

    // Status
    HitLimit bool
}

func (s *EvaluationState) EnterNode(key VisitKey) bool {
    // Check for cycle
    if s.Visited[key] {
        return false  // Cycle detected
    }

    // Update counters
    s.Depth++
    s.NodesVisited++

    // Check limits
    if s.CheckLimits() {
        return false  // Budget exceeded
    }

    // Mark as visited
    s.Visited[key] = true
    return true
}

func (s *EvaluationState) ExitNode(key VisitKey) {
    s.Depth--
    delete(s.Visited, key)  // Unmark for backtracking
}

2. Namespace-Specific Budget Configuration

type NamespaceDefinition struct {
    Name          string
    Relations     map[string]*NamespaceRelationDefinition
    BudgetLimits  *BudgetLimits  // Optional overrides
}

type BudgetLimits struct {
    MaxDepth  int32
    MaxNodes  int32
    MaxTuples int32
}

func GetBudgetLimits(namespace string, schema *CompiledSchema) *BudgetLimits {
    ns := schema.GetNamespace(namespace)
    if ns != nil && ns.BudgetLimits != nil {
        return ns.BudgetLimits
    }

    // Return defaults
    return &BudgetLimits{
        MaxDepth:  DefaultMaxDepth,
        MaxNodes:  DefaultMaxNodes,
        MaxTuples: DefaultMaxTuples,
    }
}

3. Schema Cycle Validation

type SchemaValidator interface {
    ValidateNoCycles(schema *NamespaceDefinition) error
}

func ValidateSchema(schema *NamespaceDefinition) error {
    // Validate no cycles in schema
    if err := ValidateNoCycles(schema); err != nil {
        return fmt.Errorf("schema validation failed: %w", err)
    }

    // ... other validations
    return nil
}

Takeaways

  1. Budget limits prevent DoS — Depth, node, and tuple limits bound evaluation cost, protecting against malicious graphs and resource exhaustion.

  2. Cycle detection ensures termination — Both schema validation (fail-fast) and runtime detection (fail-safe) prevent infinite loops.

  3. Configurable limits enable flexibility — Namespace-specific limits allow tuning for different use cases while maintaining safe defaults.

Why it matters: In production systems, DoS attacks and misconfigured schemas are real threats. Budget limits ensure that evaluation has bounded cost, preventing resource exhaustion even when facing malicious actors or deep hierarchies. Combined with cycle detection and monitoring, these mechanisms transform the authorization system into a production-ready service that can safely handle adversarial inputs while maintaining availability guarantees.

We now have fail-safe defaults and bounded evaluation, but there's one more critical requirement: consistency across distributed storage.

In distributed systems, data is replicated across multiple nodes. When we evaluate a Check request, we need to ensure:

  • All tuple reads come from the same snapshot

  • EXCLUSION operations see consistent state

  • Historical evaluations can be reproduced

Post 11 tackles the consistency problem: revision-based snapshots, consistency levels, and snapshot isolation that ensure correct evaluation across distributed storage.

Preview:

// Consistency levels:
// MINIMIZE_LATENCY: Read from any replica (fastest)
// AT_LEAST_AS_FRESH: Read from replica with revision >= R
// AT_EXACT_SNAPSHOT: Read from exact revision R (time travel)
// FULLY_CONSISTENT: Read from leader (strongest)

Keep Reading