Large language models-in-the-loop: Leveraging expert small artificial intelligence models for multilingual anonymization and de-identification of protected health information

Gunay, Murat; Keles, Bunyamin; Hizlan, Raife

doi:10.36922/aih025120021

Large language models-in-the-loop: Leveraging expert small artificial intelligence models for multilingual anonymization and de-identification of protected health information

Gunay M., Keles B., Hizlan R.

Artificial Intelligence in Health, cilt.3, sa.1, ss.138-151, 2026 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 3 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.36922/aih025120021
Dergi Adı: Artificial Intelligence in Health
Derginin Tarandığı İndeksler: Scopus
Sayfa Sayıları: ss.138-151
Anahtar Kelimeler: Anonymization, De-identification, Health Insurance Portability and Accountability Act, Large language models-in-the-loop, Patient safety, Protected health information
Hacettepe Üniversitesi Adresli: Evet

Özet

The rise of chronic diseases and pandemics, such as COVID-19 has emphasized the need for effective patient data processing while ensuring privacy through anonymization and de-identification of protected health information. Anonymized data facilitates research without compromising patient confidentiality. This paper introduces expert small artificial intelligence (AI) models developed using the large language model (LLM)-in-the-loop methodology to meet the demand for domain-specific de-identification of named entity recognition (NER) models. These models overcome the privacy risks associated with LLMs used through application programming interfaces by eliminating the need to transmit or store sensitive data. More importantly, they consistently outperform LLMs in de-identification tasks, offering superior performance and reliability. Our de-identification NER models, developed in eight languages—English, German, Italian, French, Romanian, Turkish, Spanish, and Arabic—achieved F1-macro score averages of 0.931, 0.960, 0.955, 0.937, 0.930, 0.963, 0.957, and 0.922, respectively. These results establish our de-identification NER models as the most accurate healthcare anonymization solutions, surpassing existing small models and even general-purpose LLMs, such as GPT-4o. While Part I of this series introduced the LLM-in-the-loop methodology for biomedical document translation, this second paper showcases its success in developing cost-effective expert small NER models in de-identification tasks. Our findings lay the groundwork for future healthcare AI innovations, including biomedical entity and relation extraction, demonstrating the value of specialized models for domain-specific challenges.