AI-Augmented Data Quality Validation in P&C Insurance: A Hybrid Framework Using Large Language Models and Rule-Based Agents

TLDR

The research presents a new solution to the current controversy on automatic data governance in insurance by fusing explainable AI with operational reliability and developing a feasible solution for businesses, striking a balance between regulatory requirements and digital transformation initiatives.

Samenvatting

The growing complexity and volume of data in Property & Casualty (P&C) insurance have intensified the need for robust, scalable, and intelligible data quality validation methodologies. Conventional rule-based validation systems provide transparency; however, they have challenges in adapting to evolving data and regulatory requirements. This paper addresses these challenges using a hybrid methodology that integrates Agentic AI, merging the precision of deterministic rule logic with the inferential prowess of large language models (LLMs). The architecture consists of modular agents—ProfilerAgent, LLMRuleAgent, RuleAgent, and SummaryAgent—each designated with a distinct role in a data quality pipeline, enhancing transparency, reusability, and scalability. Through the use of a locally hosted LLaMA model with Ollama, the system produces schema-aware YAML rules, verifies structured datasets, and creates natural language data quality issue summaries. An experimental evaluation with a real-world auto insurance claims dataset from Kaggle showed that the framework successfully identified schema mismatches, format problems, and semantic discrepancies without requiring human rule generation. The results indicate that the agentic architecture increases flexibility in resource-limited, compliance-focused settings. The research presents a new solution to the current controversy on automatic data governance in insurance by fusing explainable AI with operational reliability and developing a feasible solution for businesses, striking a balance between regulatory requirements and digital transformation initiatives. While experimented in the P&C context, the modular design enables straightforward adaptation to other domains such as retail and healthcare where similar data quality challenges exist.