The Problem Statement
In Post 10A, we established fail-safe defaults for unknown caveats and duplicate tuples. But production systems face another critical threat: denial-of-service (DoS) attacks through unbounded evaluation.
Without budget limits and cycle detection, we risk:
Denial of Service: Deep graphs consume unbounded resources
Stack overflow: Deep recursion crashes the service
Memory exhaustion: Visiting too many nodes consumes all memory
Database overload: Reading too many tuples overwhelms storage
Infinite loops: Cycles cause hangs
Real-world examples from production systems:
Looking at how production authorization systems handle resource limits reveals critical requirements:
Google Zanzibar: Budget limits prevent DoS attacks on deep hierarchies
Auth0 FGA: Cycle detection prevents infinite loops
Ory Keto: Evaluation depth limits protect against resource exhaustion
SpiceDB: Configurable limits per namespace
Common patterns that require budget limits:
DoS attacks: Malicious graphs designed to exhaust resources
Cycles: Misconfigured schemas create infinite loops
Deep hierarchies: Legitimate but deep org structures
Fan-out explosions: EDGE operations with thousands of targets
What we can't handle today:
"When evaluation encounters deep graphs, cycles, or resource limits, it must fail safely without compromising availability."
Our current model has gaps:
❌ Deep graphs → unbounded evaluation
❌ Cycles → infinite loops
❌ No resource limits → DoS vulnerable
The core problem: We need budget limits that bound evaluation cost and cycle detection that prevents infinite loops.
Human Rule
"Evaluation must have bounded cost."
Depth, nodes, and tuples must all be bounded to prevent resource exhaustion, ensuring availability even when facing malicious graphs or misconfigured schemas.
Why Current Models Don't Work
Let's try to handle resource limits using our existing tools from Posts 1-9.
What We Have So Far (Post 1-9 Recap)
From previous posts, we have:
From Post 4: Graph traversal:
func Expand(resource, relation) {
// What if graph has cycles? What if graph is too deep?
// Current: infinite loop or stack overflow ❌
}What's missing: No budget limits, no cycle detection
Approach 1: No Budget Limits (DoS Vulnerable)
// Attempt: Evaluate without limits
func Expand(resource, relation) {
// No depth limit, no node limit, no cycle detection
for child := range GetChildren(resource, relation) {
Expand(child, relation) // ← Unbounded recursion!
}
}Problems:
❌ DoS attacks: Malicious graphs exhaust resources
❌ Stack overflow: Deep graphs crash the service
❌ Infinite loops: Cycles cause hangs
❌ Resource exhaustion: Memory and CPU consumed
Example failure:
// Attacker creates deep hierarchy:
// folder:1 → folder:2 → folder:3 → ... → folder:10000
Expand(folder:1#viewer, user:alice)
// → Recursively expands 10,000 levels
// → Stack overflow or timeout
// → Service crashes! ❌
Approach 2: Fixed Limits (Inflexible)
// Attempt: Hard-coded limits
const MAX_DEPTH = 10 // ← Too restrictive!
func Expand(resource, relation, depth) {
if depth > MAX_DEPTH {
panic("depth limit exceeded") // ← Crashes!
}
for child := range GetChildren(resource, relation) {
Expand(child, relation, depth+1)
}
}Problems:
❌ Too restrictive: Legitimate hierarchies exceed limit
❌ Crashes: Panic instead of graceful degradation
❌ Inflexible: Can't tune per namespace
❌ No cycle detection: Still vulnerable to cycles
Example failure:
// Legitimate 15-level org hierarchy
Expand(folder:1#viewer, user:alice)
// → Depth 10 → panic!
// → Service crashes! ❌Conclusion: We need configurable budget limits with graceful degradation and cycle detection.
The Solution: Budget Limits and Cycle Detection
We define comprehensive rules for bounding evaluation cost.
Part 1: Budget Limits (DoS Protection)
Rule: Evaluation must respect budget limits for depth, nodes, and tuples.
type EvaluationState struct {
Depth int32 // Current recursion depth
NodesVisited int32 // Total nodes visited
TuplesRead int32 // Total tuples read
MaxDepth int32 // Maximum allowed depth
MaxNodes int32 // Maximum allowed nodes
MaxTuples int32 // Maximum allowed tuples
HitLimit bool // Whether any limit was hit
}
func (s *EvaluationState) CheckLimits() bool {
if s.MaxDepth > 0 && s.Depth > s.MaxDepth {
s.HitLimit = true
return true
}
if s.MaxNodes > 0 && s.NodesVisited > s.MaxNodes {
s.HitLimit = true
return true
}
if s.MaxTuples > 0 && s.TuplesRead > s.MaxTuples {
s.HitLimit = true
return true
}
return false
}Default limits:
const (
DefaultMaxDepth = 50 // Maximum recursion depth
DefaultMaxNodes = 1000 // Maximum nodes visited
DefaultMaxTuples = 5000 // Maximum tuples read
)Evaluation with budget:
func Check(resource, subject, context) bool {
state := &EvaluationState{
MaxDepth: DefaultMaxDepth,
MaxNodes: DefaultMaxNodes,
MaxTuples: DefaultMaxTuples,
}
result := CheckWithState(resource, subject, context, state)
if state.HitLimit {
// Budget exceeded → deny access (fail-safe)
return FALSE
}
return result
}Rationale:
✅ DoS protection: Prevents resource exhaustion
✅ Bounded cost: Evaluation has finite cost
✅ Fail-safe: Budget exceeded → deny access
✅ Configurable: Limits can be tuned per deployment
Part 2: Cycle Detection → FALSE (Safety)
Rule: If a cycle is detected during graph traversal, treat it as FALSE.
type VisitKey struct {
Namespace string
ObjectID string
Relation string
SubjectSig string
}
type EvaluationState struct {
Visited map[VisitKey]bool
// ... other fields
}
func CheckWithState(resource, subject, context, state) bool {
// Create visit key
key := VisitKey{
Namespace: resource.Namespace,
ObjectID: resource.ObjectID,
Relation: resource.Relation,
SubjectSig: subject.Signature(),
}
// Check for cycle
if state.Visited[key] {
// Cycle detected → deny access (fail-safe)
return FALSE
}
// Mark as visited
state.Visited[key] = true
defer delete(state.Visited, key) // Unmark after evaluation
// ... continue evaluation
}Rationale:
✅ Safety: Prevents infinite loops
✅ Fail-safe: Cycle → deny access
✅ Detectable: Can log cycles for debugging
✅ Schema validation: Cycles should be caught at compile time
Example:
// Misconfigured schema creates cycle:
// document:1#viewer → folder:1#viewer → document:1#viewer (cycle!)
Check(document:1#viewer, user:alice, ctx)
// → Expand document:1#viewer
// → Expand folder:1#viewer
// → Expand document:1#viewer (already visited!)
// → Cycle detected → return FALSE
// → Access denied ✅Part 3: Configurable Limits per Namespace
Rule: Different namespaces can have different budget limits.
type NamespaceDefinition struct {
Name string
Relations map[string]*NamespaceRelationDefinition
BudgetLimits *BudgetLimits // ← Override default limits
}
type BudgetLimits struct {
MaxDepth int32 // Override default MaxDepth
MaxNodes int32 // Override default MaxNodes
MaxTuples int32 // Override default MaxTuples
}
func Check(resource, subject, context) bool {
// Get namespace-specific limits
namespace := schema.GetNamespace(resource.Namespace)
limits := namespace.BudgetLimits
if limits == nil {
// Use defaults
limits = &BudgetLimits{
MaxDepth: DefaultMaxDepth,
MaxNodes: DefaultMaxNodes,
MaxTuples: DefaultMaxTuples,
}
}
state := &EvaluationState{
MaxDepth: limits.MaxDepth,
MaxNodes: limits.MaxNodes,
MaxTuples: limits.MaxTuples,
}
return CheckWithState(resource, subject, context, state)
}Example configuration:
// Folder namespace with deep hierarchies
folderNS := &NamespaceDefinition{
Name: "folder",
BudgetLimits: &BudgetLimits{
MaxDepth: 100, // Allow deeper hierarchies
MaxNodes: 2000, // Allow more nodes
MaxTuples: 10000, // Allow more tuples
},
}
// Document namespace with default limits
documentNS := &NamespaceDefinition{
Name: "document",
BudgetLimits: nil, // Use defaults (50, 1000, 5000)
}Rationale:
✅ Flexible: Different namespaces have different needs
✅ Safe: Still bounded (no unlimited evaluation)
✅ Configurable: Can tune per deployment
✅ Fail-safe: If limits not set, use safe defaults
Real-World Example: DoS Attack Scenario
Let's walk through a realistic DoS attack scenario and how budget limits protect the system.
DoS Attack Attempt
// Attacker creates deep hierarchy
for i := 0; i < 10000; i++ {
WriteTuple(&RelationTuple{
Resource: &ObjectAndRelation{Namespace: "folder", ObjectID: fmt.Sprintf("%d", i), Relation: "viewer"},
Subject: &SubjectSet{Object: &ObjectRef{Namespace: "folder", ObjectID: fmt.Sprintf("%d", i+1)}, Relation: "viewer"},
})
}
// Attacker queries top-level folder
Check(folder:0#viewer, user:attacker, ctx)
// → Start expanding folder:0#viewer
// → Depth: 1, Nodes: 1, Tuples: 1
// → Expand folder:1#viewer
// → Depth: 2, Nodes: 2, Tuples: 2
// → ... continue expanding
// → Depth: 50, Nodes: 50, Tuples: 50
// → Depth limit hit! (MaxDepth = 50)
// → Return FALSE (fail-safe)
// → Access denied ✅ (DoS attack prevented)What happened:
Attacker created 10,000-level deep hierarchy
Evaluation started expanding the graph
At depth 50, budget limit was hit
Evaluation stopped immediately (fail-safe)
Access denied (no resource exhaustion)
Monitoring:
// Log budget exceeded event
log.Warn("Budget exceeded: depth=%d, nodes=%d, tuples=%d",
state.Depth, state.NodesVisited, state.TuplesRead)
// Increment metrics
metrics.BudgetExceededCount.Inc()
metrics.BudgetExceededDepth.Observe(state.Depth)Edge Cases and Safety
Edge Case 1: Budget Exceeded Mid-Evaluation
Problem: What if budget is exceeded while evaluating multiple paths?
Solution: Return FALSE for the entire check (fail-safe).
func Check(resource, subject, context) bool {
state := &EvaluationState{MaxDepth: 50, MaxNodes: 1000, MaxTuples: 5000}
results := []EvaluationResult{}
for path := range FindPaths(resource, subject) {
result := EvaluatePath(path, context, state)
if state.HitLimit {
// Budget exceeded → deny access (fail-safe)
return FALSE
}
results = append(results, result)
}
return SelectBestResult(results)
}Rationale:
✅ Fail-safe: Budget exceeded → deny access
✅ Consistent: Don't return partial results
✅ Detectable: Can log budget exceeded events
Edge Case 2: Cycle in Schema (Not Data)
Problem: What if schema itself has cycles (e.g., viewer = computed(editor), editor = computed(viewer))?
Solution: This is a schema validation error — catch at compile time.
func ValidateSchema(schema *NamespaceDefinition) error {
visited := make(map[string]bool)
stack := make(map[string]bool)
for relationName := range schema.Relations {
if err := detectCycle(relationName, visited, stack, schema); err != nil {
return err
}
}
return nil
}
func detectCycle(relation string, visited, stack map[string]bool, schema *NamespaceDefinition) error {
if stack[relation] {
return fmt.Errorf("cycle detected: %s", relation)
}
if visited[relation] {
return nil
}
visited[relation] = true
stack[relation] = true
// Check all referenced relations
for _, ref := range getReferencedRelations(relation, schema) {
if err := detectCycle(ref, visited, stack, schema); err != nil {
return err
}
}
stack[relation] = false
return nil
}Rationale:
✅ Fail-fast: Catch cycles at schema compilation time
✅ Clear error: Tell user exactly where cycle is
✅ Prevent deployment: Don't deploy schemas with cycles
Edge Case 3: Budget Limits Too Low
Problem: What if legitimate hierarchy exceeds default budget limits?
Example:
// Large organization with 60-level hierarchy
// folder:1 → folder:2 → ... → folder:60
// Default MaxDepth = 50
Check(folder:1#viewer, user:alice, ctx)
// → Depth limit hit at level 50
// → Return FALSE (fail-safe)
// → Legitimate access denied! ❌Solution: Use namespace-specific limits (see Part 3 above).
// Configure folder namespace with higher limits
folderNS := &NamespaceDefinition{
Name: "folder",
BudgetLimits: &BudgetLimits{
MaxDepth: 100, // Allow deeper hierarchies
MaxNodes: 2000,
MaxTuples: 10000,
},
}
Check(folder:1#viewer, user:alice, ctx)
// → Use namespace-specific limits (MaxDepth = 100)
// → Evaluation succeeds at depth 60
// → Access granted ✅Rationale:
✅ Flexible: Different namespaces have different needs
✅ Safe: Still bounded (no unlimited evaluation)
✅ Tunable: Can adjust based on monitoring data
Edge Case 4: Duplicate Tuples with Same Caveat
Problem: What if truly identical tuples exist (same resource, subject, relation, caveat)?
Solution: Deduplicate before evaluation — identical tuples are evaluated once.
func FindValidTuples(resource, subject, schema) []*RelationTuple {
allTuples := storage.FindTuples(resource, subject)
// Deduplicate by signature
seen := make(map[string]bool)
uniqueTuples := []*RelationTuple{}
duplicateCount := 0
for _, tuple := range allTuples {
sig := tuple.Signature() // Includes resource, subject, caveat
if seen[sig] {
// Duplicate detected
duplicateCount++
log.Warn("Duplicate tuple detected: %s", sig)
continue
}
seen[sig] = true
if IsValidTuple(tuple, schema) {
uniqueTuples = append(uniqueTuples, tuple)
}
}
if duplicateCount > 0 {
metrics.DuplicateTupleCount.Add(duplicateCount)
}
return uniqueTuples
}Rationale:
✅ Efficient: Don't evaluate same tuple multiple times
✅ Correct: Deduplication doesn't change semantics
✅ Detectable: Log duplicates for data quality monitoring
Design Decisions
Decision 1: Budget Limits vs Unlimited Evaluation
Question: Should evaluation have budget limits or be unlimited?
Answer: Budget limits — DoS protection is essential.
Rationale:
Budget limits (chosen):
MaxDepth = 50, MaxNodes = 1000, MaxTuples = 5000Benefits:
✅ DoS protection: Prevents resource exhaustion
✅ Bounded cost: Evaluation has finite cost
✅ Predictable: Can estimate worst-case cost
✅ Configurable: Limits can be tuned per deployment
Unlimited (rejected):
No limits on depth, nodes, or tuplesProblems:
❌ DoS vulnerable: Malicious graphs exhaust resources
❌ Unpredictable cost: Can't estimate evaluation cost
❌ Stack overflow: Deep graphs crash the service
❌ Production risk: Can't safely deploy
Default limits rationale:
MaxDepth = 50: Typical org hierarchies are < 10 levels; 50 is generous
MaxNodes = 1000: Typical checks visit < 100 nodes; 1000 is generous
MaxTuples = 5000: Typical checks read < 500 tuples; 5000 is generous
Decision 2: Cycle Detection vs Schema Validation Only
Question: Should we detect cycles at runtime or only at schema validation time?
Answer: Both — schema validation catches most, runtime detection is fail-safe.
Rationale:
Both (chosen):
Schema validation: Catch cycles in schema definitions
Runtime detection: Catch cycles in data (fail-safe)Benefits:
✅ Fail-fast: Schema validation prevents deployment
✅ Fail-safe: Runtime detection handles data cycles
✅ Defense in depth: Multiple layers of protection
✅ Detectable: Can log runtime cycles for debugging
Schema validation only (rejected):
Only validate schema, assume data is correctProblems:
❌ Data cycles: Can't catch cycles created by tuples
❌ No fail-safe: Runtime cycles cause infinite loops
❌ Fragile: Assumes perfect data quality
Runtime detection only (rejected):
Only detect cycles at runtime, allow cyclic schemasProblems:
❌ Late detection: Cycles discovered during evaluation
❌ Performance: Runtime detection has overhead
❌ Poor UX: Should catch schema errors early
Decision 3: Global vs Namespace-Specific Limits
Question: Should budget limits be global or configurable per namespace?
Answer: Namespace-specific with global defaults — flexibility with safety.
Rationale:
Namespace-specific (chosen):
Global defaults: MaxDepth=50, MaxNodes=1000, MaxTuples=5000
Namespace overrides: folder.MaxDepth=100Benefits:
✅ Flexible: Different namespaces have different needs
✅ Safe: Still bounded (no unlimited evaluation)
✅ Tunable: Can adjust based on monitoring
✅ Fail-safe: Defaults ensure safety
Global only (rejected):
All namespaces use same limitsProblems:
❌ Inflexible: Can't handle deep hierarchies
❌ One-size-fits-all: Doesn't match real-world needs
❌ Breaks legitimate use cases: Deep org structures fail
Model Extension
To support budget limits and cycle detection, we extend our model with:
1. EvaluationState with Budget Tracking
type EvaluationState struct {
// Current state
Depth int32
NodesVisited int32
TuplesRead int32
// Budget limits
MaxDepth int32
MaxNodes int32
MaxTuples int32
// Cycle detection
Visited map[VisitKey]bool
// Status
HitLimit bool
}
func (s *EvaluationState) EnterNode(key VisitKey) bool {
// Check for cycle
if s.Visited[key] {
return false // Cycle detected
}
// Update counters
s.Depth++
s.NodesVisited++
// Check limits
if s.CheckLimits() {
return false // Budget exceeded
}
// Mark as visited
s.Visited[key] = true
return true
}
func (s *EvaluationState) ExitNode(key VisitKey) {
s.Depth--
delete(s.Visited, key) // Unmark for backtracking
}2. Namespace-Specific Budget Configuration
type NamespaceDefinition struct {
Name string
Relations map[string]*NamespaceRelationDefinition
BudgetLimits *BudgetLimits // Optional overrides
}
type BudgetLimits struct {
MaxDepth int32
MaxNodes int32
MaxTuples int32
}
func GetBudgetLimits(namespace string, schema *CompiledSchema) *BudgetLimits {
ns := schema.GetNamespace(namespace)
if ns != nil && ns.BudgetLimits != nil {
return ns.BudgetLimits
}
// Return defaults
return &BudgetLimits{
MaxDepth: DefaultMaxDepth,
MaxNodes: DefaultMaxNodes,
MaxTuples: DefaultMaxTuples,
}
}3. Schema Cycle Validation
type SchemaValidator interface {
ValidateNoCycles(schema *NamespaceDefinition) error
}
func ValidateSchema(schema *NamespaceDefinition) error {
// Validate no cycles in schema
if err := ValidateNoCycles(schema); err != nil {
return fmt.Errorf("schema validation failed: %w", err)
}
// ... other validations
return nil
}Takeaways
Budget limits prevent DoS — Depth, node, and tuple limits bound evaluation cost, protecting against malicious graphs and resource exhaustion.
Cycle detection ensures termination — Both schema validation (fail-fast) and runtime detection (fail-safe) prevent infinite loops.
Configurable limits enable flexibility — Namespace-specific limits allow tuning for different use cases while maintaining safe defaults.
Why it matters: In production systems, DoS attacks and misconfigured schemas are real threats. Budget limits ensure that evaluation has bounded cost, preventing resource exhaustion even when facing malicious actors or deep hierarchies. Combined with cycle detection and monitoring, these mechanisms transform the authorization system into a production-ready service that can safely handle adversarial inputs while maintaining availability guarantees.
Next → Post 11: Consistency and Revisions
We now have fail-safe defaults and bounded evaluation, but there's one more critical requirement: consistency across distributed storage.
In distributed systems, data is replicated across multiple nodes. When we evaluate a Check request, we need to ensure:
All tuple reads come from the same snapshot
EXCLUSION operations see consistent state
Historical evaluations can be reproduced
Post 11 tackles the consistency problem: revision-based snapshots, consistency levels, and snapshot isolation that ensure correct evaluation across distributed storage.
Preview:
// Consistency levels:
// MINIMIZE_LATENCY: Read from any replica (fastest)
// AT_LEAST_AS_FRESH: Read from replica with revision >= R
// AT_EXACT_SNAPSHOT: Read from exact revision R (time travel)
// FULLY_CONSISTENT: Read from leader (strongest)