Textual analysis of insurance claims with large language models

Dongchen Li; Zhuo Jin; Linyi Qian; Hailiang Yang

doi:10.1111/jori.70004

Textual analysis of insurance claims with large language models

Dongchen Li, Zhuo Jin, Linyi Qian^*, Hailiang Yang

^*Corresponding author for this work

Department of Financial and Actuarial Mathematics

Research output: Contribution to journal › Article › peer-review

Abstract

This study proposes a comprehensive and general framework for examining discrepancies in textual content using large language models (LLMs), broadening application scenarios in the insurtech and risk management fields, and conducting empirical research based on actual needs and real-world data. Our framework integrates OpenAI's interface to embed texts and project them into external categories while utilizing distance metrics to evaluate discrepancies. To identify significant disparities, we design prompts to analyze three types of relationships: identical information, logical relationships and potential relationships. Our empirical analysis shows that 22.1% of samples exhibit substantial semantic discrepancies, and 38.1% of the samples with significant differences contain at least one of the identified relationships. The average processing time for each sample does not exceed 4 s, and all processes can be adjusted based on actual needs. Backtesting results and comparisons with traditional NLP methods further demonstrate that our proposed method is both effective and robust.

Original language	English
Pages (from-to)	505-535
Number of pages	31
Journal	Journal of Risk and Insurance
Volume	92
Issue number	2
DOIs	https://doi.org/10.1111/jori.70004
Publication status	Published - Jun 2025

Keywords

discrepancy analysis
distance metrics
insurance claim settlement
large language model
risk management

Access to Document

10.1111/jori.70004

Cite this

@article{1d3955838f10438b86b4ad0b60dd28e3,

title = "Textual analysis of insurance claims with large language models",

abstract = "This study proposes a comprehensive and general framework for examining discrepancies in textual content using large language models (LLMs), broadening application scenarios in the insurtech and risk management fields, and conducting empirical research based on actual needs and real-world data. Our framework integrates OpenAI's interface to embed texts and project them into external categories while utilizing distance metrics to evaluate discrepancies. To identify significant disparities, we design prompts to analyze three types of relationships: identical information, logical relationships and potential relationships. Our empirical analysis shows that 22.1% of samples exhibit substantial semantic discrepancies, and 38.1% of the samples with significant differences contain at least one of the identified relationships. The average processing time for each sample does not exceed 4 s, and all processes can be adjusted based on actual needs. Backtesting results and comparisons with traditional NLP methods further demonstrate that our proposed method is both effective and robust.",

keywords = "discrepancy analysis, distance metrics, insurance claim settlement, large language model, risk management",

author = "Dongchen Li and Zhuo Jin and Linyi Qian and Hailiang Yang",

note = "Publisher Copyright: {\textcopyright} 2025 American Risk and Insurance Association.",

year = "2025",

month = jun,

doi = "10.1111/jori.70004",

language = "English",

volume = "92",

pages = "505--535",

journal = "Journal of Risk and Insurance",

issn = "0022-4367",

number = "2",

}

TY - JOUR

T1 - Textual analysis of insurance claims with large language models

AU - Li, Dongchen

AU - Jin, Zhuo

AU - Qian, Linyi

AU - Yang, Hailiang

PY - 2025/6

Y1 - 2025/6

N2 - This study proposes a comprehensive and general framework for examining discrepancies in textual content using large language models (LLMs), broadening application scenarios in the insurtech and risk management fields, and conducting empirical research based on actual needs and real-world data. Our framework integrates OpenAI's interface to embed texts and project them into external categories while utilizing distance metrics to evaluate discrepancies. To identify significant disparities, we design prompts to analyze three types of relationships: identical information, logical relationships and potential relationships. Our empirical analysis shows that 22.1% of samples exhibit substantial semantic discrepancies, and 38.1% of the samples with significant differences contain at least one of the identified relationships. The average processing time for each sample does not exceed 4 s, and all processes can be adjusted based on actual needs. Backtesting results and comparisons with traditional NLP methods further demonstrate that our proposed method is both effective and robust.

AB - This study proposes a comprehensive and general framework for examining discrepancies in textual content using large language models (LLMs), broadening application scenarios in the insurtech and risk management fields, and conducting empirical research based on actual needs and real-world data. Our framework integrates OpenAI's interface to embed texts and project them into external categories while utilizing distance metrics to evaluate discrepancies. To identify significant disparities, we design prompts to analyze three types of relationships: identical information, logical relationships and potential relationships. Our empirical analysis shows that 22.1% of samples exhibit substantial semantic discrepancies, and 38.1% of the samples with significant differences contain at least one of the identified relationships. The average processing time for each sample does not exceed 4 s, and all processes can be adjusted based on actual needs. Backtesting results and comparisons with traditional NLP methods further demonstrate that our proposed method is both effective and robust.

KW - discrepancy analysis

KW - distance metrics

KW - insurance claim settlement

KW - large language model

KW - risk management

UR - http://www.scopus.com/inward/record.url?scp=105000944564&partnerID=8YFLogxK

U2 - 10.1111/jori.70004

DO - 10.1111/jori.70004

M3 - Article

AN - SCOPUS:105000944564

SN - 0022-4367

VL - 92

SP - 505

EP - 535

JO - Journal of Risk and Insurance

JF - Journal of Risk and Insurance

IS - 2

ER -

Textual analysis of insurance claims with large language models

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this