The Problem Statement

Our authorization model from Posts 1-9 is powerful, flexible, and deterministic. But production systems are messy: schemas evolve, data gets corrupted, and distributed systems experience partial failures. Without fail-safe defaults, we risk:

  • Security breaches: Unknown caveats accidentally grant access

  • Data corruption: Duplicate tuples create ambiguous state

  • Deployment failures: Schema changes break existing tuples

Real-world examples from production systems:

Looking at how production authorization systems handle failures reveals critical requirements:

  • AWS IAM: Unknown policy elements default to deny (fail-safe)

  • Google Zanzibar: Graceful handling of schema evolution

  • Auth0 FGA: Duplicate tuple semantics with UNION

Common patterns that require fail-safe defaults:

  1. Schema evolution: New schema deployed, old caveats still referenced

  2. Deployment skew: Different services have different schema versions

  3. Data corruption: Duplicate tuples from concurrent writes

What we can't handle today:

"When evaluation encounters unknown caveats or duplicate tuples, it must fail safely without compromising security."

Our current model has gaps:

  • Unknown caveat → undefined behavior

  • Duplicate tuples → ambiguous semantics

The core problem: We need fail-safe defaults that ensure security even when things go wrong.

Human Rule

"When things go wrong, deny access."

Unknown caveats and invalid tuples must default to deny access, ensuring security even when the system encounters unexpected failures.

Why Current Models Don't Work

Let's try to handle failures using our existing tools from Posts 1-9.

What We Have So Far (Post 1-9 Recap)

From previous posts, we have:

From Post 5: Caveat evaluation:

func EvaluateCaveat(caveat *CaveatDefinition, context *CaveatContext) CaveatEvalResult {
    // What if caveat is nil? What if caveat doesn't exist?
    // Current: undefined behavior ❌
}

From Post 1: Tuple storage:

// What if same tuple is written twice?
// Current: duplicate tuples in storage ❌

What's missing: No fail-safe defaults for unknown caveats or duplicates

Approach 1: Ignore Errors (Dangerous)

// Attempt: Ignore unknown caveats and continue

func EvaluateCaveat(caveatName string, context *CaveatContext) CaveatEvalResult {
    caveat := caveatRegistry.Get(caveatName)
    if caveat == nil {
        // ❌ Unknown caveat → ignore it, grant access
        return CAV_TRUE
    }
    return caveat.Evaluate(context)
}

Problems:

  • Security breach: Unknown caveat grants access!

  • Deployment risk: Can't safely deploy schema changes

  • Attacker exploit: Can reference non-existent caveats to bypass checks

Example failure:

// Attacker creates tuple with non-existent caveat
tuple := &RelationTuple{
    Subject: &DirectSubject{
        Caveat: &ContextualizedCaveat{CaveatName: "nonexistent"},
    },
}

// With ignore-errors approach:
Check(document:1#viewer, user:attacker, ctx)
// → Unknown caveat "nonexistent" → CAV_TRUE
// → Access granted! ❌ SECURITY BREACH!

Approach 2: Throw Errors (Brittle)

// Attempt: Throw error on unknown caveat

func EvaluateCaveat(caveatName string, context *CaveatContext) (CaveatEvalResult, error) {
    caveat := caveatRegistry.Get(caveatName)
    if caveat == nil {
        return CAV_FALSE, fmt.Errorf("unknown caveat: %s", caveatName)
    }
    return caveat.Evaluate(context), nil
}

Problems:

  • Deployment failures: Schema changes break existing checks

  • Cascading failures: One bad tuple breaks all checks

  • Poor user experience: Errors instead of graceful degradation

  • Availability impact: System becomes unavailable

Example failure:

// During schema deployment, old caveat is removed
// Existing tuples still reference it

Check(document:1#viewer, user:alice, ctx)
// → Error: "unknown caveat: old_caveat"
// → Check fails, user can't access anything
// → System unavailable! ❌

Conclusion: We need fail-safe defaults (unknown → deny) to ensure security and graceful degradation.

The Solution: Fail-Safe Defaults

We define comprehensive rules for handling failures.

Part 1: Unknown Caveat → FALSE (Fail-Safe)

Rule: If a caveat is referenced but not found in the registry, treat it as FALSE.

func EvaluateCaveat(caveatName string, context *CaveatContext) CaveatEvalResult {
    caveat := caveatRegistry.Get(caveatName)
    if caveat == nil {
        // Unknown caveat → deny access (fail-safe)
        return CAV_FALSE
    }
    return caveat.Evaluate(context)
}

Rationale:

  • Fail-safe: Unknown policy → deny access

  • Security: Can't bypass checks with non-existent caveats

  • Graceful degradation: System continues operating

  • Detectable: Monitoring can alert on unknown caveats

Example:

// Tuple references unknown caveat
tuple := &RelationTuple{
    Subject: &DirectSubject{
        Caveat: &ContextualizedCaveat{CaveatName: "unknown_caveat"},
    },
}

Check(document:1#viewer, user:alice, ctx)
// → Evaluate caveat "unknown_caveat"
// → Caveat not found in registry
// → Return CAV_FALSE (fail-safe)
// → Access denied ✅

Schema evolution scenario:

// Day 1: Deploy schema with caveat "old_caveat"
schema_v1 := &CaveatDefinition{Name: "old_caveat", ...}

// Day 2: Deploy new schema, remove "old_caveat"
schema_v2 := &CaveatDefinition{Name: "new_caveat", ...}

// Old tuples still reference "old_caveat"
Check(document:1#viewer, user:alice, ctx)
// → Caveat "old_caveat" not found
// → Return CAV_FALSE (fail-safe)
// → Access denied (safe during migration) ✅

// After data migration, tuples reference "new_caveat"
// → Access granted with new caveat ✅

Part 2: Duplicate Tuples → UNION (Permissive)

Rule: If multiple identical tuples exist (same resource, subject, relation), treat them as UNION (any grants access).

// Duplicate tuples
tuple1 := &RelationTuple{
    Resource: document:1#viewer,
    Subject:  user:alice,
    Caveat:   &ContextualizedCaveat{CaveatName: "business_hours"},
}

tuple2 := &RelationTuple{
    Resource: document:1#viewer,
    Subject:  user:alice,
    Caveat:   &ContextualizedCaveat{CaveatName: "ip_restriction"},
}

// Evaluation: UNION semantics
// If EITHER caveat passes → grant access

Rationale:

  • Permissive: Don't break access due to duplicates

  • Eventual consistency: Handles concurrent writes

  • Idempotent: Multiple writes of same tuple don't change semantics

  • Practical: Distributed systems often have duplicates

Example:

// Concurrent writes create duplicates
// Write 1: user:alice → document:1#viewer [business_hours]
// Write 2: user:alice → document:1#viewer [ip_restriction]

Check(document:1#viewer, user:alice, ctx)
// → Find 2 tuples for user:alice
// → Evaluate both caveats
// → business_hours: FALSE (outside hours)
// → ip_restriction: TRUE (allowed IP)
// → UNION: FALSE OR TRUE = TRUE
// → Access granted ✅

Note: This is different from duplicate tuples with same caveat, which are truly identical and can be deduplicated.

Part 3: Invalid Tuples → IGNORE (Graceful Degradation)

Rule: If a tuple violates schema constraints (e.g., disallowed subject type), ignore it during evaluation.

func FindValidTuples(resource, subject, schema) []*RelationTuple {
    allTuples := storage.FindTuples(resource, subject)
    validTuples := []*RelationTuple{}

    for _, tuple := range allTuples {
        if IsValidTuple(tuple, schema) {
            validTuples = append(validTuples, tuple)
        } else {
            // Invalid tuple → ignore (graceful degradation)
            log.Warn("Ignoring invalid tuple: %v", tuple)
        }
    }

    return validTuples
}

func IsValidTuple(tuple *RelationTuple, schema *CompiledSchema) bool {
    relation := schema.GetRelation(tuple.Resource.Namespace, tuple.Resource.Relation)
    if relation == nil {
        return false  // Unknown relation
    }

    // Check subject constraints
    if relation.SubjectConstraints != nil {
        if !relation.SubjectConstraints.AllowsSubject(tuple.Subject) {
            return false  // Disallowed subject type
        }
    }

    return true
}

Rationale:

  • Graceful degradation: Don't break all checks due to one bad tuple

  • Schema evolution: Old tuples don't break new schema

  • Data quality: System tolerates corrupt data

  • Detectable: Can log invalid tuples for cleanup

Example:

// Schema only allows user subjects
schema := &NamespaceRelationDefinition{
    SubjectConstraints: &SubjectConstraints{
        AllowedSubjectTypes: []AllowedSubjectType{
            &AllowedDirectSubject{Namespace: "user"},
        },
    },
}

// Invalid tuple: group subject (not allowed)
invalidTuple := &RelationTuple{
    Resource: document:1#viewer,
    Subject:  &DirectSubject{Object: &ObjectRef{Namespace: "group", ObjectID: "admins"}},
}

// Valid tuple: user subject
validTuple := &RelationTuple{
    Resource: document:1#viewer,
    Subject:  &DirectSubject{Object: &ObjectRef{Namespace: "user", ObjectID: "alice"}},
}

Check(document:1#viewer, user:alice, ctx)
// → Find tuples: [invalidTuple, validTuple]
// → Filter: invalidTuple violates schema → ignore
// → Evaluate: validTuple only
// → Result based on validTuple ✅

Real-World Example: Production Deployment Scenario

Let's walk through a realistic production scenario with schema evolution and duplicates.

Initial State (Day 1)

// Schema v1: Simple business hours caveat
businessHours_v1 := &CaveatDefinition{
    Name: "business_hours",
    Parameters: []CaveatParameter{
        &ScalarParameter{Name: "current_hour", Type: CAV_T_INT},
    },
    Expression: &BooleanExpr{
        Op: BOOL_AND,
        Children: []*BooleanExpr{
            {Op: BOOL_PREDICATE, Pred: &Predicate{
                Left:  &IdentifierExpr{Name: "current_hour"},
                Op:    CMP_GE,
                Right: &LiteralExpr{Value: 9, Type: CAV_T_INT},
            }},
            {Op: BOOL_PREDICATE, Pred: &Predicate{
                Left:  &IdentifierExpr{Name: "current_hour"},
                Op:    CMP_LT,
                Right: &LiteralExpr{Value: 17, Type: CAV_T_INT},
            }},
        },
    },
}

// Tuples using v1 caveat
tuple1 := &RelationTuple{
    Resource: document:1#viewer,
    Subject:  &DirectSubject{
        Object: user:alice,
        Caveat: &ContextualizedCaveat{CaveatName: "business_hours"},
    },
}

Schema Evolution (Day 2)

// Schema v2: Enhanced business hours with timezone support
businessHours_v2 := &CaveatDefinition{
    Name: "business_hours_tz",  // ← New name
    Parameters: []CaveatParameter{
        &ScalarParameter{Name: "env.now_utc", Type: CAV_T_TIMESTAMP},
        &ScalarParameter{Name: "user.timezone", Type: CAV_T_STRING},
    },
    Expression: &BooleanExpr{
        Op: BOOL_AND,
        Children: []*BooleanExpr{
            {Op: BOOL_PREDICATE, Pred: &Predicate{
                Left: &CallExpr{
                    FnName: "local_hour",
                    Args: []CaveatExpr{
                        &IdentifierExpr{Name: "env.now_utc"},
                        &IdentifierExpr{Name: "user.timezone"},
                    },
                },
                Op:    CMP_GE,
                Right: &LiteralExpr{Value: 9, Type: CAV_T_INT},
            }},
            // ... similar for < 17
        },
    },
}

// Deploy v2 schema (removes "business_hours", adds "business_hours_tz")
schemaService.Deploy(schema_v2)

What Happens to Old Tuples?

// Old tuple still references "business_hours" (removed in v2)
Check(document:1#viewer, user:alice, ctx)
// → Evaluate caveat "business_hours"
// → Caveat not found in registry (removed in v2)
// → Return CAV_FALSE (fail-safe)
// → Access denied ✅ (safe during migration)

Concurrent Writes Create Duplicates (Day 3)

// Service A writes new tuple with v2 caveat
serviceA.WriteTuple(&RelationTuple{
    Resource: document:1#viewer,
    Subject:  &DirectSubject{
        Object: user:alice,
        Caveat: &ContextualizedCaveat{CaveatName: "business_hours_tz"},
    },
})

// Service B (old version) writes tuple with v1 caveat (concurrent!)
serviceB.WriteTuple(&RelationTuple{
    Resource: document:1#viewer,
    Subject:  &DirectSubject{
        Object: user:alice,
        Caveat: &ContextualizedCaveat{CaveatName: "business_hours"},
    },
})

// Now we have 2 tuples for user:alice!

Evaluation with Duplicates

Check(document:1#viewer, user:alice, ctx)
// → Find 2 tuples:
//   - Tuple 1: business_hours (unknown caveat)
//   - Tuple 2: business_hours_tz (valid caveat)
// → Evaluate Tuple 1:
//   - Caveat "business_hours" not found
//   - Return CAV_FALSE (fail-safe)
// → Evaluate Tuple 2:
//   - Caveat "business_hours_tz" found
//   - Evaluate with context
//   - Return CAV_TRUE (during business hours)
// → UNION: FALSE OR TRUE = TRUE
// → Access granted ✅ (graceful handling of duplicates)

Data Corruption (Day 4)

// Corrupt tuple violates schema constraints
corruptTuple := &RelationTuple{
    Resource: document:1#viewer,
    Subject:  &DirectSubject{
        Object: &ObjectRef{Namespace: "invalid_namespace", ObjectID: "xyz"},
    },
}

// Schema only allows "user" namespace
schema := &SubjectConstraints{
    AllowedSubjectTypes: []AllowedSubjectType{
        &AllowedDirectSubject{Namespace: "user"},
    },
}

Check(document:1#viewer, user:alice, ctx)
// → Find tuples: [corruptTuple, validTuple]
// → Filter invalid tuples:
//   - corruptTuple: namespace "invalid_namespace" not allowed → ignore
//   - validTuple: namespace "user" allowed → keep
// → Evaluate validTuple only
// → Result based on validTuple ✅ (graceful degradation)

Design Decisions

Decision 1: Fail-Safe vs Fail-Open

Question: Should unknown caveats deny (fail-safe) or grant (fail-open) access?

Answer: Fail-safe (deny) security over availability.

Rationale:

Fail-safe (chosen):

Unknown caveat  deny access

Benefits:

  • Security: Can't bypass checks with unknown caveats

  • Compliance: Meets security audit requirements

  • Predictable: Unknown → deny is intuitive

  • Detectable: Monitoring alerts on unknown caveats

Fail-open (rejected):

Unknown caveat  grant access

Problems:

  • Security risk: Attackers can exploit unknown caveats

  • Deployment risk: Schema changes accidentally grant access

  • Compliance failure: Violates security best practices

  • Unpredictable: Unknown → grant is surprising

Trade-off: Fail-safe may cause temporary access denials during schema migration, but this is acceptable because:

  1. Migrations can be planned and tested

  2. Temporary denial is better than security breach

  3. Monitoring alerts on unknown caveats enable quick fixes

Decision 2: UNION vs INTERSECTION for Duplicates

Question: Should duplicate tuples be combined with UNION (any grants) or INTERSECTION (all must grant)?

Answer: UNION (any grants) permissive and practical.

Rationale:

UNION (chosen):

If ANY duplicate grants access  grant access

Benefits:

  • Permissive: Don't break access due to duplicates

  • Eventual consistency: Handles concurrent writes gracefully

  • Idempotent: Multiple writes don't change semantics

  • Practical: Distributed systems often have duplicates

INTERSECTION (rejected):

If ALL duplicates grant access  grant access

Problems:

  • Too restrictive: One bad duplicate breaks access

  • Fragile: Concurrent writes can accidentally deny access

  • Confusing: Why would duplicates need to all agree?

  • Impractical: Doesn't match real-world distributed systems

Model Extension

To support fail-safe defaults, we extend our model with:

1. Caveat Evaluation with Unknown Handling

type CaveatRegistry interface {
    GetCaveat(name string) (*CaveatDefinition, error)
}

func EvaluateCaveat(caveatName string, context *CaveatContext, registry CaveatRegistry) CaveatEvalResult {
    caveat, err := registry.GetCaveat(caveatName)
    if err != nil || caveat == nil {
        // INV-16: Unknown caveat → FALSE (fail-safe)
        log.Warn("Unknown caveat: %s", caveatName)
        metrics.UnknownCaveatCount.Inc()
        return CAV_FALSE
    }

    return caveat.Evaluate(context)
}

2. Tuple Validation and Filtering

type TupleValidator interface {
    IsValid(tuple *RelationTuple, schema *CompiledSchema) bool
}

func FindValidTuples(resource, subject, schema) []*RelationTuple {
    allTuples := storage.FindTuples(resource, subject)

    // Deduplicate by signature
    seen := make(map[string]bool)
    validTuples := []*RelationTuple{}

    for _, tuple := range allTuples {
        sig := tuple.Signature()

        if seen[sig] {
            // Duplicate detected → skip (already processed)
            metrics.DuplicateTupleCount.Inc()
            continue
        }
        seen[sig] = true

        if validator.IsValid(tuple, schema) {
            validTuples = append(validTuples, tuple)
        } else {
            // Invalid tuple → ignore (INV-8)
            log.Warn("Invalid tuple: %v", tuple)
            metrics.InvalidTupleCount.Inc()
        }
    }

    return validTuples
}

3. Duplicate Tuple Semantics

func EvaluateDuplicates(tuples []*RelationTuple, context *CaveatContext) CaveatEvalResult {
    // UNION semantics: ANY duplicate grants access
    results := []CaveatEvalResult{}

    for _, tuple := range tuples {
        result := EvaluateTupleCaveat(tuple, context)
        results = append(results, result)

        // Short-circuit on TRUE (UNION optimization)
        if result == CAV_TRUE {
            return CAV_TRUE
        }
    }

    // Combine with UNION tri-state logic
    return CombineWithUnion(results)
}

Takeaways

  1. Fail-safe defaults ensure security — Unknown caveats and invalid tuples default to deny access, preventing security breaches even when things go wrong.

  2. UNION semantics for duplicates — Duplicate tuples are combined with UNION (any grants access), handling eventual consistency and concurrent writes gracefully.

  3. Graceful degradation — Invalid tuples are ignored rather than causing cascading failures, allowing the system to continue operating with partial data.

Why it matters: In production distributed systems, failures are inevitable—schema changes, concurrent writes, and data corruption all create edge cases. Fail-safe defaults ensure that when the unexpected happens, the system denies access rather than granting it incorrectly. Combined with monitoring and schema migration strategies, these mechanisms transform a theoretically correct authorization system into a production-ready service that handles real-world chaos while maintaining security guarantees.

We now have fail-safe defaults for unknown caveats and duplicates, but there's another critical requirement: bounded evaluation to prevent denial-of-service attacks.

In distributed systems, malicious actors can create deep hierarchies or cyclic graphs designed to exhaust resources. When we evaluate a Check request, we need to ensure:

  • Evaluation has bounded depth (no stack overflow)

  • Evaluation visits bounded nodes (no memory exhaustion)

  • Evaluation reads bounded tuples (no database overload)

  • Cycles are detected and handled safely

Post 10B tackles the DoS protection problem: budget limits, cycle detection, and configurable limits that ensure bounded evaluation cost.

Preview:

// Budget limits:
type EvaluationState struct {
    MaxDepth  int32  // Maximum recursion depth (default: 50)
    MaxNodes  int32  // Maximum nodes visited (default: 1000)
    MaxTuples int32  // Maximum tuples read (default: 5000)
}

// Budget exceeded → deny access (fail-safe)

Keep Reading