AI-generated voice is now useful enough for education, healthcare, accessibility, media, and commerce — but realistic enough to expose a dangerous gap between human perception and synthetic-voice deception. This review compresses 226 original studies into a structured human- and machine-readable evidence map, showing where voice cloning, synthetic speech, detection, authentication, and provenance are already working — and where they remain unsafe, fragile, or poorly validated.
Human-verified editorial review
Verified by World ID proof-of-human. This editorial layer was submitted from a SAIMSARA account verified as a unique human.
Evidence preview
Clinical / practical impact
Useful voice interfaces
AI-generated voice is already being tested in education, healthcare, accessibility, media, and commercial interaction.
Healthcare workflow signal
Voice-enabled AI can support patient education, clinical documentation, virtual patients, and medical simulations, but needs oversight.
Accessibility and self-voice
Personalized voices may help users with visual, physical, hearing, or speech impairments preserve identity and communicate more naturally.
Evidence / detection frontier
Humans are unreliable detectors
Listener studies showed weak or inconsistent detection, including very low accuracy in vishing-style synthetic voice clips.
Automated detectors can excel
Dataset-specific systems reported very high accuracy using spectrograms, acoustic features, CNNs, transformers, and ensemble models.
Voice realism has acoustic fingerprints
Prosody, pitch, timbre, spectral artifacts, vowel-level cues, and time-frequency anomalies remain important signals for detection.
Translation gaps / governance
Consent and identity risk
Voice cloning raises practical questions about ownership, impersonation, misinformation, child-facing use, and posthumous or clinical identity replication.
Layered safeguards needed
Safe deployment depends on provenance, watermarking, authentication, explainable detection, and human review rather than one metric alone.
Benchmarks remain fragile
Generalizability is limited by small human studies, heterogeneous datasets, multilingual gaps, adversarial attacks, and real-time deployment constraints.
Swipe sideways on mobile · full evidence map opens after unlock
Abstract: To map the original research literature on AI-generated voice, identify the most query-relevant recurring finding, and synthesize major research topics, practical implications, limitations, and future directions across technical, human-centered, clinical, educational, security, and societal domains. The review utilises 226 original studies with 3297311 total participants (topic deduplicated ΣN). This scoping review suggests that AI-generated voice has reached a level of realism and social utility sufficient to support meaningful applications across education, healthcare, and accessibility, while simultaneously outpacing unaided human ability to distinguish synthetic from authentic speech, with listener accuracy reported as low as 37.5% in vishing-style clips. The dominant signal is a widening gap between human perceptual limits and the demonstrated, though dataset-specific, capability of automated detectors reaching above 99% accuracy in constrained settings. This convergence highlights that safe deployment depends less on any single performance metric than on layered safeguards combining provenance, explainable detection, and authentication. Generalizability remains constrained by heterogeneous benchmarks and small human studies. Future research should prioritize standardized multilingual, adversarial, real-time evaluation alongside enforceable consent and provenance frameworks for voice cloning.
Final search date and database lock: 2026-05-09 01:55:45 CEST
Plan: Pro (expanded craft tokens; source: Semantic Scholar)
Source: Semantic Scholar
Total Abstracts/Papers: 112003
Downloaded Abstracts/Papers: 1000
Included original and non-original Abstracts/Papers (all): 270
Included original Abstracts/Papers (Vote counting by direction of effect): 226
Reference Index (links used in paper): 170
Total participants (topic deduplicated ΣN): 3297311
Get access to the full paper
Unlock the full evidence map
The full evidence review, including the Introduction, Methods, Results, Discussion, Conclusion, figures, and complete reference index, opens after purchase or sign-in.
The Evidence Object JSON is a separate machine-readable evidence product: a concentrated synthesis of results, topic-level evidence, and discussion across original and non-original studies. It can be directly input into your LLM, agent, or RAG workflow.
[2] AI-Driven Fraud Prevention in Agricultural Subsidies: Advances in Deepfake Detection, AI-Generated Voice Forensics, and Blockchain-Enabled Security — https://doi.org/10.1109/ciscon66933.2025.11337441
[3] Understanding Anak Dalam Tribe students’ experiences with deep learning-based AI-generated voice cloning of Tumenggung — https://doi.org/10.24071/icre.v1i1.47
[8] The role of prompt, voice, and personality factors in the acceptance and evaluation of AI-generated mindfulness exercises — https://doi.org/10.1038/s41598-025-21290-1
[19] Synthetic Smiles and Robotic Voices: Decoding Kinesics in AI-Generated News Anchors in Indian Public and Private Broadcasts — https://doi.org/10.58966/jcm2025439
[26] Towards Understanding the Impact of AI-Generated Visual and Vocal Self-Similarity on Avatar Identification, Motivation, and Engagement in Educational Games — https://doi.org/10.1145/3744736.3749363
[31] The Role of Artificial Intelligence in Political Advertising and Crisis Communication: A Case Study of AI-Generated Speech of a Political Leader — https://doi.org/10.56976/rjsi.v6i3.258
[32] AI in Political Advertising and Crisis Communication: A Case Study of Former Prime Minister Imran Khan's AI Generated Speech during Pakistan 2024 General Elections — https://doi.org/10.11648/j.ajsea.20241202.11
[52] To disclose or not disclose, is no longer the question – effect of AI-disclosed brand voice on brand authenticity and attitude — https://doi.org/10.1108/jpbm-02-2022-3864
[64] Awaaz-e-Sehat: A Mobile Voice-based AI System for EMR Generation and Clinical Decision Support in Low-resource Maternal Healthcare — https://doi.org/10.1145/3790115
[65] A Voice-Enabled Multilingual AI-Driven Framework for Natural Language to SQL with Adaptive Query Generation and Dynamic Data Visualization — https://doi.org/10.1109/icauc68182.2026.11440966
[73] Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems — https://doi.org/10.48550/arxiv.2509.07677
[88] Speculative Memory and Machine Augmentation: A Polyvocal Rendering of Brutalist Architecture Through AI and Photogrammetry — https://doi.org/10.3390/heritage8100401
[89] The Impact of AI Speech Synthesis on the Broadcasting Profession and Its Transformation Path: A Study Based on TTS Technologies — https://doi.org/10.1145/3776759.3776803
[92] An Analysis of an AI-Integrated Student- Centered Promotional Video for Educational Institutions: A Qualitative Descriptive Study — https://doi.org/10.14710/ca.v9i2.30029
[93] I Was Never in the AI.R.A: an Experimental Study on the Use of Manipulated Audio for the Censorship of Political Dissidents — https://doi.org/10.1145/3706599.3716238
[112] Use of artificial intelligence (AI) in augmentative and alternative communication (AAC): community consultation on risks, benefits and the need for a code of practice — https://doi.org/10.1108/jet-01-2024-0007
[119] AI voices reduce cognitive activity? A psychophysiological study of the media effect of AI and human newscasts in Chinese journalism — https://doi.org/10.3389/fpsyg.2023.1243078
[128] The 360˚ View: Contextualized Virtual Reality Tours as Innovative Teaching Tool in Ecology for Elementary School Students — https://doi.org/10.37251/jber.v6i1.1213
[146] A Multimodal Framework Bridging Classical Thought and Artificial Intelligence for the Digital Preservation and Cross-Cultural Dissemination of the Tao Te Ching — https://doi.org/10.1109/ialp68296.2024.11156623
[168] Designing a realistic peer-like embodied conversational agent for supporting children\textquotesingle s storytelling — https://doi.org/10.48550/arxiv.2304.09399
[177] From recorded to AI-generated instructional videos: A comparison of learning performance and experience — https://doi.org/10.1111/bjet.13530
[179] ENHANCING SHORT-TERM MEMORY IN CONSECUTIVE INTERPRETING TRAINING: EVALUATING AI-GENERATED SPEECH SIMULATIONS FOR ENGLISH MAJORS AT THE DIPLOMATIC ACADEMY OF VIETNAM — https://doi.org/10.63023/2525-2445/jfs.ulis.5530
[180] The impact of AI-generated videos on student learning and engagement in a virtual learning environment for online STEM courses — https://doi.org/10.1117/12.3057455
[186] Look Who’s Talking Now: The Effects of Pre-recorded and AI-generated Synthetic Brand Voices on Brand Anthropomorphism and Brand Equity — https://doi.org/10.1177/09732586241253651
[189] “Robot Emotions Are Not Real!”: Future Factory Workers’ Perceptions, Attitudes, and Experience of Collaborative Robots, Conversational AIs, and AI-Empowered, Voice-Enabled Collaborative Robots — https://doi.org/10.1145/3779295
[191] Securing Voice-Based Financial Authentication in the Era of AI Voice Cloning: Challenges, Vulnerabilities, and Counter-Measures — https://doi.org/10.32996/jcsts.2025.7.4.60
[192] "AI Voice: A modern strategy in teaching literature to the students of Cabadbaran City National High School " — https://doi.org/10.69651/pijhss0403207
[193] A Pilot Test of an AI Voice-Driven Simulation With Feedback for Medical Students to Practice Discussing Diagnostic Mammogram Results With Patients — https://doi.org/10.7759/cureus.95606
[194] Voice‐over anatomy lectures created by AI‐voice cloning technology: A descriptive article — https://doi.org/10.1002/ase.2524
[196] Customizing Generated Signs and Voices of AI Avatars: Deaf-Centric Mixed-Reality Design for Deaf-Hearing Communication — https://doi.org/10.1145/3710953
[198] From Voice to Choice: Understanding the Consumer Response to Voice Assistant and Human Recommendations — https://doi.org/10.1111/ijcs.70128
[199] Effectiveness of Al-Assisted Patient Health Education Using Voice Cloning and ChatGPT: Prospective Randomized Controlled Trial — https://doi.org/10.2196/81387
[200] Mimicking the Human Voice: Investigation of the Effectiveness of Medex and Izotope RX10 in Detecting Artificial Intelligence Voice Cloning — https://doi.org/10.1109/icecet63943.2025.11472401
[213] Generative AI and political simulacra: Hyperreality, consumption, and humorous relationships in users’ YouTube comments — https://doi.org/10.5210/fm.v31i3.14379
[237] THE IMPACT OF TECHNOLOGY ON COPYRIGHT: COPYRIGHT DETERMINATION MECHANISM FOR SOUNDS PRODUCED BY ARTIFICIAL INTELLIGENCE — https://doi.org/10.61796/jaide.v2i3.1491
[250] Mirror Neurons cannot be Fooled by Artificial Voices – a study with Implications for Education using Magnetic Resonance Imaging (MRI) and Convolutional Neural Network (CNN) — https://doi.org/10.46300/9109.2025.19.12
[256] An Exploratory Analysis of ChatGPT Compared to Human Performance With the Anesthesiology Oral Board Examination: Initial Insights and Implications — https://doi.org/10.1213/ane.0000000000006875
[257] Is this the real life? Investigating the credibility of synthesized faces and voices created by amateurs using artificial intelligence tools. — https://doi.org/10.1145/3604321.3604329
[258] Short-Term Perceptual Training Modulates Neural Responses to Deepfake Speech But Does Not Improve Behavioral Discrimination — https://doi.org/10.1523/eneuro.0300-25.2026
[264] Electromyography based Gesture Recognition: An Implementation of Hand Gesture Analysis Using Sensors — https://doi.org/10.33317/ssurj.424