Lending AI governance has three requirements that most governance frameworks ignore: adverse action reason codes generated at decision time, segment-level fairness monitoring, and decision records replayable for the full regulatory retention period.

Lending AI Governance: Adverse Action, Fairness, and Replayable Credit Decisions | AI Governance

This post is part of the Regulated AI Implementation pillar.

Credit decisions are among the most governed AI outputs in existence. Consumer protection law, fair lending regulation, and model risk management guidance all impose specific requirements that go beyond standard AI performance monitoring.

The challenge is not that the requirements are unclear. It is that they are frequently misunderstood as monitoring requirements when they are actually design requirements. Adverse action documentation, fairness analysis, and replayability must be built into the system — they cannot be retrofitted after deployment.

This case study applies the Architecture of Proof framework to a production lending AI system.

What makes lending AI governance different

Three requirements distinguish lending AI from most other high-stakes AI applications.

Individual adverse action documentation. A credit denial must be explained to the specific individual who received it, using the specific factors that drove that decision. An aggregate model explanation ("the model weights income at 0.3") is not an adverse action notice. The notice must name the applicant's specific factors — and they must be factors that actually contributed to this specific decision, not features that generally matter to the model.

This requirement forces adverse action reason code generation to be a first-class system function, not a post-hoc report. The decision trace must capture which factors contributed most to the outcome at the time of decision.

Protected characteristic compliance. Lending models are legally prohibited from using protected characteristics (race, color, religion, sex, national origin, marital status, age) as inputs. But the prohibition extends to proxies — inputs that are strongly correlated with protected characteristics and serve as substitutes for them. Neighborhood, zip code, and surname patterns can introduce protected characteristic discrimination even when the protected attribute itself is excluded.

Fairness monitoring must run at the segment level — separately for protected groups — not just on overall accuracy. And it must run in production, not just on validation data.

Replayability for the full retention period. A credit decision made today may be challenged five years from now. The system must be able to reconstruct that specific decision — what inputs were used, which rules fired, which model version produced the score, and what reason codes were generated — using records created at decision time, not by re-running a current model on historical data.

Composite system design

Rules layer: - Eligibility rules: hard cutoffs that determine whether the model is called at all (e.g., minimum time in business, maximum debt-to-income ratio, geographic eligibility) - Regulatory rules: conditions that must always be enforced regardless of model score (e.g., sanctions screening, prohibited purpose checking) - Decision rules: the thresholds that translate model scores into actions (approve, decline, counter-offer at a different term)

Model layer: - Credit risk scoring model: predicts probability of default or serious delinquency over a defined horizon - Income verification model: estimates income stability from behavioral and transaction data (where permitted) - Feature attribution: generates the ranked list of contributing factors for each decision (integrated into the scoring pipeline, not computed separately)

Human layer: - Internal credit review: handles appeals, exceptions, and cases in defined gray-zone score bands - Model risk committee: conducts quarterly model reviews against performance contracts - Compliance function: monitors fair lending metrics and initiates investigation when thresholds breach

Control tier assignment

Decision Type Tier Conditions
Auto-approve Tier 2 Score above approval threshold + all eligibility rules pass + no regulatory flags
Auto-decline Tier 2 Score below decline threshold + no eligible counter-offer
Counter-offer routing Tier 1 Score in counter-offer band; system generates offer terms, human reviews before send
Manual review Tier 0/3 Score in gray zone, exception request, regulatory flag, or applicant appeal
Fraud/sanctions flag Tier 3 Always human: no automated credit action on flagged applications

Adverse action reason code architecture

This is the most frequently misdesigned element of lending AI systems.

Adverse action reason codes must be generated at the time of decision, using the actual inputs and model state that produced the outcome. They cannot be generated retroactively from a current model, because the model may have been retrained and will produce different feature attributions for the same input.

Implementation requirements: 1. Feature attribution is computed as part of the scoring pipeline — not as a separate post-hoc process 2. The top N adverse factors (typically 4) are captured in the decision trace along with their direction of impact 3. Factor codes are mapped to human-readable reason code language at decision time 4. The reason code set is reviewed by legal to ensure ECOA/Regulation B compliance

Worked example — adverse action trace:

{
  "decision_id": "app-20240315-7782B",
  "applicant_id_hash": "sha256:a3f8b2...",
  "decision": "DECLINED",
  "model_score": 0.31,
  "score_threshold_for_approval": 0.55,
  "adverse_factors_ranked": [
    {"factor_code": "AF-018", "label": "Insufficient time in current employment", "impact": "negative"},
    {"factor_code": "AF-042", "label": "High ratio of debt to income", "impact": "negative"},
    {"factor_code": "AF-007", "label": "Limited credit history length", "impact": "negative"},
    {"factor_code": "AF-029", "label": "Recent delinquency on existing account", "impact": "negative"}
  ],
  "model_version": "risk-credit-v3.1.0",
  "reason_code_schema_version": "ecoa-v4",
  "payload_ref_id": "s3://lending-traces/2024/03/7782B.json.gz"
}

This record is sufficient to generate a complaint-compliant adverse action notice for this applicant for as long as the regulatory retention window requires.

Fairness monitoring design

Fair lending monitoring must run at the segment level, not just on aggregate metrics.

Minimum monitoring requirements:

Monitoring frequency: monthly at minimum, with quarterly deep-dive analysis.

Trigger for investigation: any segment showing a 20%+ disparity in approval rate or pricing vs. the reference group, with no credit-risk explanation.

Replayability for regulatory retention

The full decision record — including model version, inputs, rules that fired, and reason codes — must be retained for the regulatory minimum (typically 25 months for ECOA; longer for enterprise risk purposes).

Key design requirement: the model version must be tagged in every decision record and the model artifact must be preserved — not just the weights, but the full scoring pipeline including feature engineering. If the pipeline cannot be restored to the state it was in at decision time, you cannot replay the decision.

This is why version-controlled model registry is not optional in regulated lending — it is a regulatory compliance requirement.


Download the Architecture of Proof Checklist

Ready to implement? Get the definitive checklist for building verifiable AI systems.

Zoomed image
Free Download

Downloading Resource

Enter your email to get instant access. No spam — only occasional updates from Architecture of Proof.

Success

Link Sent

Great! We've sent the download link to your email. Please check your inbox.