13 questions. 5 minutes. Find out where your organization stands on AI behavioral compliance, and what gaps regulators will notice first.
Answer 13 questions about your organization's AI monitoring practices. You'll get an instant readiness scorecard with insights for each area. Takes about 5 minutes.
Do you know what AI you have, and who's responsible for it?
Are you watching how your AI systems behave in production?
Can you prove your AI systems are compliant right now?
What happens when your AI produces something it shouldn't?
Are AI, compliance, and risk teams working together?
Most organizations deploying AI in production focus their monitoring on data drift: statistical changes in model inputs compared to training data. Tools like Evidently AI, Arize, WhyLabs, and Fiddler do this well. They alert data science teams when input distributions shift, which can signal the need for model retraining.
But regulators in insurance, banking, and healthcare aren't asking whether your input data has changed. They're asking whether your AI is still producing fair, accurate, and compliant outputs.
That's a different question. And it requires a different type of monitoring.
Behavioral monitoring evaluates what AI systems actually do: the tone, accuracy, fairness, and compliance of their outputs over time. It catches problems that data drift tools miss entirely, like an AI claims triage system that starts routing certain demographic groups to slower queues, or a credit decisioning model that gradually shifts its approval thresholds without any change in input data.
The gap between data monitoring and behavioral monitoring is where regulatory risk lives. An AI system can show stable inputs and strong performance metrics while its outputs slowly drift out of compliance. Without behavioral monitoring, that drift goes undetected until a customer complains or a regulator finds it during an exam.
The NAIC Model Bulletin, OCC SR 11-7, the EU AI Act, and the NIST AI Risk Management Framework all require or strongly recommend continuous monitoring of AI systems in production. The common expectation: organizations must monitor AI behavior after deployment, not just validate before it.
The NAIC Model Bulletin on the Use of AI Systems by Insurers has been adopted in 25 states as of March 2026. It requires insurers to maintain governance and risk management frameworks over their AI systems, including ongoing monitoring to ensure AI decisions remain fair, accurate, and compliant with state insurance regulations.
The NAIC AI Systems Evaluation Tool (ASET) pilot is now live in 12 states: California, Colorado, Connecticut, Florida, Iowa, Louisiana, Maryland, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin. The pilot runs from March through September 2026. Participating states are using the tool during market conduct exams and financial examinations.
The ASET has 4 exhibits. Exhibit A quantifies AI usage across lines of business. Exhibit B is a governance risk assessment framework. Exhibit C asks detailed questions about high-risk AI systems. Exhibit D covers AI data details. Regulators will focus on domestic insurers and apply proportionality, spending more time on high-risk consumer-facing AI.
The ASET asks specifically about ongoing monitoring practices, not just pre-deployment validation. Regulators want to know if you can demonstrate continuous compliance.
The ASET pilot runs through September 2026. Based on pilot results, the tool will be updated, re-exposed for public comment, and considered for broader adoption at the NAIC fall meeting in November 2026. For insurers not yet in pilot states, the question isn't whether this tool will be used in your state's examinations. It's when. 25 states have already adopted the Model Bulletin. The ASET gives regulators a structured way to enforce it.
The OCC's Supervisory Guidance on Model Risk Management (SR 11-7) requires banks to validate models on an ongoing basis. For AI systems used in credit decisioning, fraud detection, and customer interaction, this means continuous monitoring of model behavior and performance. The guidance specifically addresses the risk of model degradation over time.
The OCC has increased scrutiny of AI-specific risks in recent examination cycles. Banks using AI for consumer-facing decisions face the same behavioral monitoring gap that insurers do: strong validation at deployment followed by limited ongoing behavioral oversight.
CMS and state regulators are increasingly examining AI used in prior authorization and claims processing. Connecticut has proposed legislation requiring human review of AI-driven claim denials. Several states are exploring similar requirements.
The scrutiny focuses on whether AI decisions are fair, accurate, and consistent: behavioral qualities that can't be measured through input data monitoring alone.
The EU AI Act establishes explicit post-market monitoring requirements for high-risk AI systems. Providers and deployers must implement monitoring systems that detect changes in AI behavior affecting compliance. The Act's enforcement timeline is active, making this a current obligation for organizations operating in EU markets.
Governance documentation tells regulators your AI was validated. Compliance evidence tells regulators your AI is still performing correctly. Most organizations have the first but not the second.
Model cards, fairness assessments, validation reports, and policy documents are necessary. But they answer a different question than what regulators are increasingly asking.
Documentation tells a regulator that your AI system was validated and approved. Compliance evidence tells a regulator that your AI system is still operating within those approved parameters right now.
This is the compliance evidence gap. An EY survey of 500 technology executives in early 2026 found that 78% say AI adoption is outpacing their organization's ability to manage the associated business risks. A separate EY study of 975 C-suite leaders found that while 72% of organizations have integrated and scaled AI, only a third have responsible controls in place across all governance dimensions.
The gap is widest in ongoing behavioral monitoring. Organizations invest heavily in pre-deployment validation (model cards, bias testing, fairness assessments) but have limited capabilities for continuous post-deployment monitoring of AI output quality.
For regulated industries, closing this gap means moving from periodic manual reviews to continuous automated monitoring that generates regulator-ready evidence packages. Point-in-time evaluation tells regulators the model was good. Continuous monitoring with documented evidence tells regulators the model is still good.
For a comparison of data drift detection tools, see our guide to AI drift detection tools.