LLM Hallucinations in Clinical Documentation: Scoping Review with ☸️SAIMSARA.

Name: SAIMSARA Evidence Object digital::LLM_HALLUCINATIONS_SS
Creator: SAIMSARA
License: https://saimsara.com/license/

SAIMSARA

doi:10.62487/saimsarab885dc08

SAIMSARA Journal

Machine-Readable Science • ISSN 3054-3991

LLM Hallucinations in Clinical Documentation: Scoping Review with ☸️SAIMSARA.

Digital Health & Biotech

Issue 3, Volume 1, 2026

Chat with this issue

DOI: 10.62487/saimsarab885dc08

Editorial note

• Last update: 2026-05-18 08:59:51

What is this paper about

LLM documentation is not simply “safe” or “unsafe”: this evidence map shows where hallucination risk becomes measurable, where it falls below human benchmarks, and which safeguards actually matter—RAG, verification loops, fine-tuning, and human review. Built from 46 references and 32 original studies, it turns scattered clinical AI documentation evidence into a practical safety map for deciding whether, where, and how LLMs should enter real clinical workflows.

Human-verified editorial review Verified by World ID proof-of-human. This editorial layer was submitted from a SAIMSARA account verified as a unique human.

Evidence preview · Did you know?

Realistic clinical documentation scene with physician reviewing an AI-generated patient note.

Good-looking notes can still hallucinate

Did you know? In 97 real outpatient encounters, ambient LLM notes had more hallucinations than physician “gold” notes: 31% vs 20%.

The notes may appear organized and useful while still adding unsupported clinical content.

Realistic hospital AI dashboard showing variable hallucination risk across clinical documentation tasks.

The risk range is surprisingly wide

Did you know? Reported hallucination rates ranged from 0.73–2.0% in constrained workflows to 31–35% in higher-risk tasks.

Clinical LLM safety depends heavily on task design, source grounding, and workflow constraints.

Realistic hospital governance scene with clinicians reviewing AI documentation safeguards.

Safeguards may change the outcome

Did you know? RAG, Generator-Verifier-Judge loops, and fine-tuning reduced major hallucinations to human-comparable or lower levels in several workflows.

The key question is not whether LLMs write notes, but which control layers make them clinically acceptable.

Swipe sideways on mobile · full evidence map opens after unlock

Abstract: To synthesize contemporary evidence regarding the prevalence, characteristics, and mitigation strategies for LLM hallucinations within clinical documentation and medical information extraction tasks. The review uses 46 references and builds its evidence map from 32 original studies with 32465 total participants/sample observations (topic-deduplicated ΣN). This scoping review suggests that LLM hallucination in clinical documentation is a measurable but highly context-dependent risk, with rates spanning from roughly 0.73% in monitored agentic deployments to 31–35% in ambient transcription and zero-shot extraction tasks. The dominant signal across included evidence is that architectural and workflow safeguards—particularly retrieval-augmented generation, multi-agent verification, and domain-specific fine-tuning—were associated with hallucination rates approaching or below human documentation benchmarks. These findings support a role for LLMs as assistive documentation tools embedded within human-in-the-loop workflows rather than autonomous systems, especially given persistent risks of unique errors, hallucinations, inaccuracies, and omissions. Future research should prioritize standardized hallucination metrics and prospective, multi-site safety evaluations to clarify which mitigation architectures most reliably preserve documentation integrity across clinical specialties.

Keywords: Large Language Models; Clinical Documentation; Hallucination Mitigation; Retrieval-Augmented Generation; Factual Consistency; Clinical Note Generation; Medical Transcription; Discharge Summaries; Electronic Health Records; Patient Safety

Review Stats

Final search date and database lock: 2026-05-14 22:11:23 CEST
Plan: Pro (expanded craft tokens; source: Semantic Scholar)
Source: Semantic Scholar
Total Abstracts/Papers: 93
Downloaded Abstracts/Papers: 93
Included original and non-original Abstracts/Papers (all): 53
Included original Abstracts/Papers (Vote counting by direction of effect): 32
Reference Index (links used in paper): 46
Total participants/sample observations (topic deduplicated ΣN): 32465

Get access to the full paper

Unlock the full evidence map

Full paper access includes the complete human-readable review, figures, reference index, PDF export, and machine-readable Evidence JSON download.
Evidence JSON can also be purchased separately if you only need the LLM-ready object for agent, AI, or RAG workflows.
Institutional or library access? Sign in with your institution email to open all available SAIMSARA papers under your institution access arrangement.
Need a SAIMSARA review on your own topic? ☸️Request.

Reference Index (46)

[1] A Survey of LLM-based Agents in Medicine: How far are we from Baymax? — https://doi.org/10.48550/arxiv.2502.11211
[2] A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation — https://doi.org/10.1038/s41746-025-01670-7
[3] Challenges of Implementing LLMs in Clinical Practice: Perspectives — https://doi.org/10.3390/jcm14176169
[4] Effective prompt design for large language models in clinical practice — https://doi.org/10.1080/17843286.2026.2613903
[5] Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini — https://doi.org/10.48550/arxiv.2410.15528
[6] Extracting International Classification of Diseases Codes from Clinical Documentation Using Large Language Models — https://doi.org/10.1055/a-2491-3872
[7] Reducing administrative burden in cardio-oncology: an LLM-powered approach for automated summarization of oncology and cardiotoxicity histories in dutch medical records — https://doi.org/10.1093/eurheartjsupp/suaf083.227
[8] LLM-Based Dictation Detection from Doctor-Patient Conversations — https://doi.org/10.1109/asru65441.2025.11434612

Unlock the full paper to view the complete Reference Index.