SAIMSARA Journal

Machine Generated Science • ISSN 3054-3991

Limitations of Medical Machine Translation: Scoping Review with ☸️SAIMSARA.

Digital Health icon

Digital Health

Issue 3, Volume 1, 2026

DOI: 10.62487/saimsarae1a843d5

Editorial note
• Last update: 2026-03-29 21:55:05
What is this paper about
Medical machine translation may look fluent, but this review shows where it still breaks: semantic precision, cultural nuance, audience adaptation, and high-stakes clinical reliability. The full read is worth it because it separates the real strengths of MT in constrained tasks from the specific failure modes that still make expert human oversight essential.

Evidence preview · Did you know?
Realistic medical translation scene showing a clinician reviewing AI-translated patient information.

Fluent translation can still be unsafe

Did you know? Medical machine translation can look fluent while still losing semantic precision, context, terminology, and audience fit in high-stakes medical text.

That is why the evidence supports MT as an assistive tool requiring expert post-editing, not a safe replacement for human medical translation.

Realistic radiology workstation showing multilingual medical terminology mapping and structured coding.

The best signal came from structure

Did you know? In a constrained radiology setting, hybrid translation plus terminology-aware similarity reached F1 90.15%, precision 91.78%, recall 88.59% for cross-language RadLex coding.

The strongest signal is not “generic MT is enough,” but that structured medical tasks improve when translation is domain-constrained.

Realistic healthcare AI governance scene showing human review of translated clinical text and warning markers.

Even the score can miss the danger

Did you know? Common MT evaluation metrics may miss clinically relevant translation errors at the segment level.

This is the governance gap: healthcare translation needs source-faithful checks, expert oversight, and special caution for vulnerable populations.

Swipe sideways on mobile · full evidence map opens after unlock

Abstract: The aim of this review is to synthesize the documented limitations of medical machine translation across various architectures, including LLMs and NMT systems, focusing on terminological accuracy, contextual interpretation, and audience adaptation. The review utilises 15 references. This evidence map suggests that the dominant limitation of medical machine translation is not simple lexical inaccuracy alone, but a broader failure to preserve semantic precision, contextual meaning, and audience-appropriate expression in high-stakes settings. Even when machine performance appears strong on conventional metrics, important residual weaknesses remain, as illustrated by a TER of 0.99 in a classical medical translation task and lower syntactic complexity in LLM output than in human translation (CP/T 1.16 vs 1.30). Across the mapped topics, recurrent signals pointed to terminology errors, polysemy-related mistranslation, discourse-level simplification, and loss of cultural nuance, with especially important implications for sensitive communication such as mental healthcare and for low-resource or unevenly supported language pairs. Taken together, the literature supports using medical MT as an assistive tool that still requires expert human oversight rather than as a stand-alone substitute for clinical or specialist translation judgment. Future research should focus on clinically sensitive evaluation frameworks, register-adaptive multilingual systems, and stronger real-world validation in complex medical communication tasks.

Keywords: Medical machine translation; Neural machine translation; Large language models; Medical terminology accuracy; Contextual ambiguity; Syntactic simplification; Translation quality metrics; Post-editing; Domain-specific translation; Cultural connotations

Review Stats

Get access to the full paper

Unlock the full evidence map

The full evidence review, including the Introduction, Methods, Results, Discussion, Conclusion, figures, and complete reference index, opens after purchase or sign-in. The Evidence Object JSON is a separate machine-readable evidence product: a concentrated synthesis of results, topic-level evidence, and discussion across original and non-original studies. It can be directly input into your LLM, agent, or RAG workflow.

Reference Index (15)