How do you govern AI in fraud detection?

Fraud detection AI governance requires managing the false positive rate as a first-class metric alongside recall, designing circuit breakers that automatically downgrade from hard blocks to challenge-only when the error rate rises, maintaining replayable decision traces for every blocking decision, and defining clear escalation paths for customer disputes.

What is the right control tier for a fraud detection system?

Most fraud detection systems operate at Tier 2: high-leverage autonomy with circuit breakers. Hard blocking of verified high-confidence fraud is justified at Tier 2. Blocking of medium-confidence or first-occurrence patterns should be Tier 1 or Tier 0 (challenge or suggest) to limit false positive exposure.

How do you handle false positives in fraud AI?

False positives in fraud AI require a pre-designed response path: automatic downgrade from hard block to challenge when the rate breaches a threshold, a dispute resolution process with a defined SLA, and a decision trace that allows the support team to reconstruct exactly why a transaction was blocked without re-running the model.

Fraud Detection AI Governance: A Case Study in High-Stakes Autonomy | AI Governance

This post is part of the Regulated AI Implementation pillar.

Fraud detection is one of the clearest governance challenges in production AI. The volume is high, the stakes are real on both sides of the decision, and the consequences of error are asymmetric in a way that makes standard accuracy metrics insufficient.

Miss a fraudulent transaction: financial loss. Block a legitimate transaction: a damaged customer relationship, a potential dispute, and in some contexts, a regulatory event.

This case study walks through how to apply the Architecture of Proof framework to a production fraud detection system — from control tier assignment through circuit breaker design, decision trace architecture, and dispute resolution.

The governance challenge this system must solve

A production fraud detection system must answer five questions correctly at every decision:

Is this transaction authorized or fraudulent?
If uncertain, what is the right response — block, challenge, or approve?
If we block, can we explain why to the customer and to a regulator?
If our blocking rate on legitimate transactions rises, do we detect it before it escalates?
If a customer disputes a block, can we reconstruct the decision without re-running the model?

Standard AI deployment answers question one. The Architecture of Proof framework answers all five.

Composite intelligence design

A fraud detection system is a composite system — rules, models, and humans each play a defined role.

Rules layer: - Velocity rules: block transactions exceeding defined rate limits (e.g., 5 transactions in 60 seconds on a new card) - List-based rules: block transactions from IP addresses on active sanctions or known fraud lists - Structural rules: reject any transaction with missing or structurally invalid fields before scoring - Policy rules: always require step-up authentication for transactions above a defined dollar threshold

Model layer: - Real-time fraud scoring model: produces a fraud probability score (0–1) for each transaction - Device fingerprinting model: produces a device risk score - Behavioral model: detects anomalies in session behavior vs. the account's historical pattern

Human layer: - Fraud analyst team: reviews high-value escalated blocks, investigates organized fraud patterns, manages dispute resolution - Model risk function: performs monthly model review against performance contracts

Control tier assignment

Not all fraud responses operate at the same autonomy level.

Decision Type	Control Tier	Rationale
Block — high-confidence fraud (score > 0.92 + velocity rule)	Tier 2	Clear signal, reversible via dispute, circuit-breaker protected
Challenge — medium-confidence (score 0.65–0.92)	Tier 1	Step-up auth; human completes the authorization
Approve with monitoring (score < 0.65, no rule violations)	Tier 1	Logged for behavioral monitoring; no human required
Escalate — high-value + medium-confidence (> $10K + score 0.5–0.8)	Tier 0/3	Analyst review required before any action

The tier boundaries are set at deployment, documented, and changed only with evidence and sign-off — not adjusted ad hoc in response to individual cases.

Component contracts

Rules contracts:

"The velocity rule must fire for 100% of transactions that meet the defined rate limit criteria. Zero exceptions. Bypass via any channel (web, API, partner integration) constitutes a rule violation."

"Sanctions list matching must run before model scoring on every transaction. Any failure to access the list within 200ms must trigger a fallback to Tier 3 (human review) for the affected transactions."

Model contracts:

"The fraud model must maintain a false positive rate below 1.2% on non-fraudulent transactions in any 7-day rolling window. If FPR exceeds 1.8%, hard-block behavior must automatically downgrade to challenge-only pending review."

"The model must not be used for transaction types outside the defined training population (currently: card-present and card-not-present retail; excluded: ACH, wire, crypto). Routing violations must be logged."

Human contracts:

"Escalated blocks must be reviewed within 2 hours. Analyst must select a disposition code (confirm-block, reverse-block, escalate-further). Cases with no disposition within 4 hours are automatically reversed and logged as a SLA violation."

Circuit breaker design

Circuit breakers are the mechanism that prevents a model performance issue from becoming a customer impact event before it is detected.

Primary circuit breaker: - Trigger: 7-day rolling false positive rate exceeds 1.8% - Action: Auto-downgrade all Tier 2 hard blocks to Tier 1 challenges - Notification: immediate alert to fraud operations lead and model risk - Resolution: model risk review completes within 48 hours before Tier 2 is restored

Secondary circuit breaker: - Trigger: transaction block volume increases more than 40% week-over-week with flat fraud confirmation rate - Action: block threshold raised by 0.05 (more conservative blocking), alert to model risk - Resolution: investigation within 24 hours

Tertiary circuit breaker: - Trigger: any sanctions list access failure exceeding 500ms - Action: transactions that cannot be checked auto-route to Tier 3 (human review) - Resolution: engineering incident response

Decision trace design

Every blocking decision must produce a trace that answers a customer dispute or regulatory inquiry within hours — not days.

Minimum required trace fields:

{
  "decision_id": "txn-20240315-4892A",
  "timestamp": "2024-03-15T09:42:17Z",
  "transaction_amount": 847.50,
  "merchant_category": "electronics",
  "control_tier_at_decision": 2,
  "rules_fired": [
    {"rule_id": "VEL-003", "result": "pass"},
    {"rule_id": "SANC-001", "result": "pass"}
  ],
  "model_scores": {
    "fraud_model_id": "fraud-v2.4.1",
    "fraud_score": 0.94,
    "device_risk_score": 0.81
  },
  "decision": "BLOCKED",
  "block_reason_category": "high_fraud_score",
  "payload_sha256": "a3f8b2...",
  "payload_ref_id": "s3://fraud-traces/2024/03/4892A.json.gz"
}

This trace is sufficient to answer: which rules ran, which model version scored, what the score was, and what control tier authorized the block — for any transaction within the retention window.

Dispute resolution process

A customer dispute on a fraud block has a defined SLA and a defined evidence process:

Customer initiates dispute via support channel
Support agent queries decision trace using transaction ID (response within 30 seconds)
Agent reviews: was the block rule-triggered or model-triggered? What was the score?
If rule-triggered: agent confirms rule basis with customer and documents outcome
If model-triggered: agent escalates to fraud analyst with pre-populated trace context
Analyst reviews full trace, makes disposition decision within 4 hours
Outcome logged, used in monthly false positive analysis

No step in this process requires re-running the model on current data. The trace contains all the state needed for the review.

What good looks like in production

A well-governed fraud detection system has: - FPR < 1.5% with circuit breakers that activate before it reaches 2% - Every blocking decision traceable in under 60 seconds from the decision ID - Tier boundaries documented, signed off, and version-controlled - Monthly model review with published findings - Dispute resolution SLA met at > 95% without requiring engineering involvement

Regulated AI Implementation: The full pillar covering governance requirements across all regulated domains.
Control Tiers for AI-enabled Processes: Designing the tier boundaries that this case study applies.
AI Audit Trails: Building the decision trace architecture referenced in this case study.
AI Escalation Protocols: The escalation and circuit breaker mechanics in detail.

Download the Architecture of Proof Checklist

Ready to implement? Get the definitive checklist for building verifiable AI systems.