AI-Generated Voice, Synthetic Speech, and Voice Cloning: Scoping Review with ☸️SAIMSARA.

Name: SAIMSARA Evidence Object digital::AI_VOICE_SS
Creator: SAIMSARA
License: https://saimsara.com/license/

SAIMSARA

doi:10.62487/saimsara3635922a

AI-Generated Voice, Synthetic Speech, and Voice Cloning: Scoping Review with ☸️SAIMSARA.

Issue 3, Volume 1, 2026

Editorial note

• Last update: 2026-05-09 08:35:32

What is this paper about

AI-generated voice is now useful enough for education, healthcare, accessibility, media, and commerce — but realistic enough to expose a dangerous gap between human perception and synthetic-voice deception. This review compresses 226 original studies into a structured human- and machine-readable evidence map, showing where voice cloning, synthetic speech, detection, authentication, and provenance are already working — and where they remain unsafe, fragile, or poorly validated.

Human-verified editorial review Verified by World ID proof-of-human. This editorial layer was submitted from a SAIMSARA account verified as a unique human.

Evidence preview

Realistic scene of a patient speaking with a medical voice robot.

Clinical / practical impact

Useful voice interfaces

AI-generated voice is already being tested in education, healthcare, accessibility, media, and commercial interaction.

Healthcare workflow signal

Voice-enabled AI can support patient education, clinical documentation, virtual patients, and medical simulations, but needs oversight.

Accessibility and self-voice

Personalized voices may help users with visual, physical, hearing, or speech impairments preserve identity and communicate more naturally.

Realistic scene of an AI engineer testing voice patterns on a large monitor.

Evidence / detection frontier

Humans are unreliable detectors

Listener studies showed weak or inconsistent detection, including very low accuracy in vishing-style synthetic voice clips.

Automated detectors can excel

Dataset-specific systems reported very high accuracy using spectrograms, acoustic features, CNNs, transformers, and ensemble models.

Voice realism has acoustic fingerprints

Prosody, pitch, timbre, spectral artifacts, vowel-level cues, and time-frequency anomalies remain important signals for detection.

Realistic scene of a mobile phone warning that an incoming call may use AI-generated voice.

Translation gaps / governance

Consent and identity risk

Voice cloning raises practical questions about ownership, impersonation, misinformation, child-facing use, and posthumous or clinical identity replication.

Layered safeguards needed

Safe deployment depends on provenance, watermarking, authentication, explainable detection, and human review rather than one metric alone.

Benchmarks remain fragile

Generalizability is limited by small human studies, heterogeneous datasets, multilingual gaps, adversarial attacks, and real-time deployment constraints.

Swipe sideways on mobile · full evidence map opens after unlock

Abstract:

Keywords:

Review Stats

Final search date and database lock: 2026-05-09 01:55:45 CEST

Plan: Pro (expanded craft tokens; source: Semantic Scholar)

Source: Semantic Scholar

Total Abstracts/Papers: 112003

Downloaded Abstracts/Papers: 1000

Included original and non-original Abstracts/Papers (all): 270

Included original Abstracts/Papers (Vote counting by direction of effect): 226

Reference Index (links used in paper): 170

Total participants (topic deduplicated ΣN): 3297311

Get access to the full paper

Unlock the full evidence map

The full evidence review, including the Introduction, Methods, Results, Discussion, Conclusion, figures, and complete reference index, opens after purchase or sign-in. The Evidence Object JSON is a separate machine-readable evidence product: a concentrated synthesis of results, topic-level evidence, and discussion across original and non-original studies. It can be directly input into your LLM, agent, or RAG workflow.

Reference Index (170)

[1] Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice — https://doi.org/10.48550/arxiv.2406.10422
[2] AI-Driven Fraud Prevention in Agricultural Subsidies: Advances in Deepfake Detection, AI-Generated Voice Forensics, and Blockchain-Enabled Security — https://doi.org/10.1109/ciscon66933.2025.11337441
[3] Understanding Anak Dalam Tribe students’ experiences with deep learning-based AI-generated voice cloning of Tumenggung — https://doi.org/10.24071/icre.v1i1.47
[4] The Ethical Use of AI-Generated Voice Technologies: A Legal and Normative Perspective in China — https://doi.org/10.54254/2755-2721/2025.25730
[5] Real-Time Detection of AI-Generated Voice Using Machine Learning and MATLAB Simulink — https://doi.org/10.1109/iciiet65921.2025.11379238
[6] Enhancing the Robustness of AI-Generated Voice Detectors Against Data Poisoning Attacks — https://doi.org/10.1109/cic67133.2025.00027
[7] AI-Generated Voice Deepfake Detection for Election Manipulation and Political Voter Suppression — https://doi.org/10.1109/icoeca68095.2026.11485352
[8] The role of prompt, voice, and personality factors in the acceptance and evaluation of AI-generated mindfulness exercises — https://doi.org/10.1038/s41598-025-21290-1
[9] Evaluating the Importance of Demographic and Technical Factors in Creating Authentic-Sounding AI-Generated Human Voice Clones — https://doi.org/10.1109/sieds65500.2025.11021162
[11] Leveraging AI-Generated Emotional Self-Voice to Nudge People towards their Ideal Selves — https://doi.org/10.1145/3706598.3713359
[12] Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion — https://doi.org/10.48550/arxiv.2308.12734
[16] Future You: Designing and Evaluating Multimodal AI-generated Digital Twins for Strengthening Future Self-Continuity — https://doi.org/10.48550/arxiv.2512.06106
[17] AI Generated Audio Detection Using Acoustic Audio Features — https://doi.org/10.1109/icitr69413.2025.11353634
[19] Synthetic Smiles and Robotic Voices: Decoding Kinesics in AI-Generated News Anchors in Indian Public and Private Broadcasts — https://doi.org/10.58966/jcm2025439
[20] Can AI-Generated Feedback Overshadow Teachers’ Existence? — https://doi.org/10.38035/dijemss.v6i5.4503
[21] Deep Learning-Based Detection of AI-Generated Voices Using Spectral Features — https://doi.org/10.1109/icmisi65108.2025.11115443
[22] A General Classification Framework for Detecting AI-Generated Voices — https://doi.org/10.1109/icac69156.2025.11361479
[23] Revolutionizing Communication with AI-Generated Mails for Visually Impaired And Physically Disabled Using Python, Django — https://doi.org/10.63345/ijrmeet.org.v13.i4.1341
[24] Detecting AI-Generated Speech Manipulation through CNN-BiLSTM Hybrid Networks — https://doi.org/10.1109/ispcc66872.2025.11039608
[25] Exploring the Impact of AI-Generated Speech on Avatar Perception and Realism in Virtual Reality Environments — https://doi.org/10.1109/vrw66409.2025.00134
[26] Towards Understanding the Impact of AI-Generated Visual and Vocal Self-Similarity on Avatar Identification, Motivation, and Engagement in Educational Games — https://doi.org/10.1145/3744736.3749363
[28] The Potential of Learning With AI-Generated Pedagogical Agents in Instructional Videos — https://doi.org/10.1145/3613905.3647966
[30] Real-Time Deepfake Detection for AI-Generated Arabic Speech — https://doi.org/10.1109/niles63360.2024.10753151
[31] The Role of Artificial Intelligence in Political Advertising and Crisis Communication: A Case Study of AI-Generated Speech of a Political Leader — https://doi.org/10.56976/rjsi.v6i3.258
[32] AI in Political Advertising and Crisis Communication: A Case Study of Former Prime Minister Imran Khan's AI Generated Speech during Pakistan 2024 General Elections — https://doi.org/10.11648/j.ajsea.20241202.11
[33] The Power of AI-Generated Voices: How Digital Vocal Tract Length Shapes Product Congruency and Ad Performance — https://doi.org/10.1177/10949968231194905
[34] Future You: Designing and Evaluating Multimodal AI-generated Digital Twins for Strengthening Future Self-Continuity — https://doi.org/10.1145/3742413.3789455
[35] Reminiscences of Futures (RoF): Designing and Studying Encounters with Embodied, AI-Generated Future Selves — https://doi.org/10.1145/3772318.3791750
[37] AI-generated documentation of psychiatric interviews: a proof-of-concept study. — https://doi.org/10.3389/fpsyt.2026.1621532
[38] People are poorly equipped to detect AI-powered voice clones — https://doi.org/10.1038/s41598-025-94170-3
[39] AI voice journaling for future language teachers: A path to well‐being through reflective practices — https://doi.org/10.1002/berj.4174
[40] Finding the Human Voice in AI: Insights on the Perception of AI-Voice Clones from Naturalness and Similarity Ratings — https://doi.org/10.21437/interspeech.2025-947
[43] VishGuard: AI-Powered Real-Time Defense Against Voice Phishing — https://doi.org/10.1109/ictbig68706.2025.11323735
[45] AI-Assisted Learning in EMI: A Case Study of Leveraging TTS and Voice Cloning in a Korean EMI Course — https://doi.org/10.18178/ijiet.2025.15.11.2432
[47] AI-Driven Bilingual Voice Chatbot — https://doi.org/10.22214/ijraset.2025.71640
[49] Exploring the Potential of Generative AI for Korean Speaking Assessment: Focusing on Multimodal-based Voice Interaction — https://doi.org/10.46392/kjge.2025.19.4.205
[50] AI Based Voice Cloning System: From Text to Speech — https://doi.org/10.38124/ijisrt/25apr834
[51] AI-Driven Virtual Assistant Transforming Image into Voice — https://doi.org/10.48175/ijarsct-30026
[52] To disclose or not disclose, is no longer the question – effect of AI-disclosed brand voice on brand authenticity and attitude — https://doi.org/10.1108/jpbm-02-2022-3864
[53] Application of AI for modelling and structural analysis of a parametric 2D frame with voice assistant — https://doi.org/10.1051/e3sconf/202458602003
[54] Streamlining Radiology Reporting: A hands-free approach with voice-to-text and generative AI. — https://doi.org/10.1016/j.jacr.2024.10.004
[55] AI Based Voice Cloning and Generation for Vocally Challenged — https://doi.org/10.55041/ijsrem39456
[56] StoryForge: AI-Powered Narrative with Dynamic Imagery and Voice — https://doi.org/10.55041/ijsrem32973
[57] Proactive Detection of Voice Cloning with Localized Watermarking — https://doi.org/10.48550/arxiv.2401.17264
[58] AI Voice Features and Emotional Self‐Dominance: Shaping Student Engagement in Instructional Video Learning — https://doi.org/10.1111/ejed.70413
[59] Enhancing Task Logging and Quality Monitoring through Voice and AI-Based Systems — https://doi.org/10.1109/iscbi69404.2026.11495916
[60] Multi-Lingual Social Engineering Audio Analysis (Vishing) & AI Voice Detection with Explainable AI — https://doi.org/10.1109/iciis69028.2026.11450875
[61] “Impact of Vernacular AI Voice Advertisements on Rural Consumer Adoption of Fintech Products”—An Empirical Study — https://doi.org/10.51583/ijltemas.2026.15020000072
[64] Awaaz-e-Sehat: A Mobile Voice-based AI System for EMR Generation and Clinical Decision Support in Low-resource Maternal Healthcare — https://doi.org/10.1145/3790115
[65] A Voice-Enabled Multilingual AI-Driven Framework for Natural Language to SQL with Adaptive Query Generation and Dynamic Data Visualization — https://doi.org/10.1109/icauc68182.2026.11440966
[66] Reality Copilot: Voice-First Human-AI Collaboration in Mixed Reality Using Large Multimodal Models — https://doi.org/10.48550/arxiv.2602.11025
[67] RU-AI: A Large Multimodal Dataset for Machine-Generated Content Detection — https://doi.org/10.1145/3701716.3715306
[69] Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection (CTRSVDD) Challenge 2024 — https://doi.org/10.1109/slt61566.2024.10832226
[70] SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge — https://doi.org/10.1109/slt61566.2024.10832284
[71] VoiceForge: A Text-Driven Character Voice Generation System for Narrative Content Creation — https://doi.org/10.1145/3706599.3720140
[72] Spectrogram-Based CNN Framework for Live Deepfake Voice Detection in Google Meet — https://doi.org/10.1109/icctdc64446.2025.11158792
[73] Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems — https://doi.org/10.48550/arxiv.2509.07677
[74] INVOX: A Robust System For Detecting Synthetic Voice Attacks — https://doi.org/10.1109/iccspci68199.2025.11451424
[77] Voice-Activated Personalized Assistant with Raspberry Pi and Google API Integration — https://doi.org/10.1109/icaiss61471.2025.11041858
[78] Voice clones sound realistic but not (yet) hyperrealistic — https://doi.org/10.1371/journal.pone.0332692
[83] Research on Voice Humming to Guzheng Audio Transcription and Pitch-Lowering Processing Based on Genetic Algorithm and YIN Algorithm — https://doi.org/10.1109/dsins64146.2024.10992049
[84] Real-Time Voice-Based LLM Integration for XR Tutoring: A Prototype Implementation — https://doi.org/10.1109/aixvr67263.2026.00050
[86] Security Risks of Generative AI in Financial Systems: A comprehensive review — https://doi.org/10.17013/wjis.v1i3.16
[87] Transforming Higher Education with AI-Powered Video Lectures — https://doi.org/10.48550/arxiv.2511.20660
[88] Speculative Memory and Machine Augmentation: A Polyvocal Rendering of Brutalist Architecture Through AI and Photogrammetry — https://doi.org/10.3390/heritage8100401
[89] The Impact of AI Speech Synthesis on the Broadcasting Profession and Its Transformation Path: A Study Based on TTS Technologies — https://doi.org/10.1145/3776759.3776803
[92] An Analysis of an AI-Integrated Student- Centered Promotional Video for Educational Institutions: A Qualitative Descriptive Study — https://doi.org/10.14710/ca.v9i2.30029
[93] I Was Never in the AI.R.A: an Experimental Study on the Use of Manipulated Audio for the Censorship of Political Dissidents — https://doi.org/10.1145/3706599.3716238
[94] Legal AI Assistant: A Web-Based System for Legal Guidance — https://doi.org/10.55041/ijsrem47342
[95] CryptoSecureVerify: A Secure AI-based Deepfake Detection Platform — https://doi.org/10.1109/iceca66444.2025.11383047
[100] AI Got Your Tongue? Analysing the Sounds of Audio Deepfake Generation Methods — https://doi.org/10.1145/3731715.3734425
[103] Image Talk: A Model for Image Caption Generation with Voice — https://doi.org/10.1109/incoft60753.2023.10425636
[104] PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response — https://doi.org/10.1145/3708821.3733861
[105] Securecall Intelligence Engine for Preventing Voice-Channel Exploits — https://doi.org/10.54660/.jfmr.2023.4.2.363-369
[106] Empathy by Design: The Influence of Trembling AI Voices on Prosocial Behavior — https://doi.org/10.1109/taffc.2023.3332742
[107] Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios — https://doi.org/10.48550/arxiv.2602.20061
[108] AI-Integrated Gamified Learning Media for Enhancing Chinese Pronunciation: A Design and Implementation Study — https://doi.org/10.1109/ectidamtncon67592.2026.11460098
[110] Understanding Nigerian Students’ Reactions to AI-Driven Health Advertising on Social Media — https://doi.org/10.65773/ssia.2.1.36
[112] Use of artificial intelligence (AI) in augmentative and alternative communication (AAC): community consultation on risks, benefits and the need for a code of practice — https://doi.org/10.1108/jet-01-2024-0007
[113] STUDENTS’ PERCEPTION ON THE USE OF SHADOWING TECHNIQUE USING TEXT-TO-SPEECH AI TO IMPROVE ENGLISH PRONUNCIATION — https://doi.org/10.21776/ub.educafl.2024.007.02.06
[116] Storytelling based on generative AI to promote the inclusion of people with disabilities — https://doi.org/10.17163/ings.n32.2024.10
[117] Generative AI-assisted Standalone Wearable Reading Glasses for the Visually Impaired — https://doi.org/10.1109/raaicon64172.2024.10928631
[118] Real-time speech adaptations in conversations between human interlocutor and AI confederate — https://doi.org/10.1121/10.0027725
[119] AI voices reduce cognitive activity? A psychophysiological study of the media effect of AI and human newscasts in Chinese journalism — https://doi.org/10.3389/fpsyg.2023.1243078
[122] Student Perceptions of AI-Narrated Voice-Over Presentations — https://doi.org/10.29007/hs85
[128] The 360˚ View: Contextualized Virtual Reality Tours as Innovative Teaching Tool in Ecology for Elementary School Students — https://doi.org/10.37251/jber.v6i1.1213
[129] AI based Reading System for Blind using OCR — https://doi.org/10.1109/iceca.2019.8822226
[130] SONICS: Synthetic Or Not - Identifying Counterfeit Songs — https://doi.org/10.48550/arxiv.2408.14080
[131] AudioMarkBench: Benchmarking Robustness of Audio Watermarking — https://doi.org/10.48550/arxiv.2406.06979
[133] Artificial Intelligence Physician Avatars for Patient Education: A Pilot Study — https://doi.org/10.3390/jcm14238595
[135] 23.7 BROCA: A 52.4-to-559.2mW Mobile Social Agent System-on-Chip with Adaptive Bit-Truncate Unit and Acoustic-Cluster Bit Grouping — https://doi.org/10.1109/isscc49661.2025.10904658
[136] Personalized Conversational Audio Descriptions in 360° Virtual Reality for Blind and Low-Vision Users — https://doi.org/10.1109/ismar-adjunct68609.2025.00268
[137] Artificial Intelligence and Digital Imaging: Future Trends in Editing and Ethical Concerns — https://doi.org/10.63887/jse.2025.1.2.7
[138] Robust Multilingual Audio Deepfake Detection Through Hybrid Modeling — https://doi.org/10.1145/3733102.3736706
[142] I Can Hear You: Selective Robust Training for Deepfake Audio Detection — https://doi.org/10.48550/arxiv.2411.00121
[145] Specialized Language Models for Combating Audio and Video Spam — https://doi.org/10.1109/svcc65277.2025.11133616
[146] A Multimodal Framework Bridging Classical Thought and Artificial Intelligence for the Digital Preservation and Cross-Cultural Dissemination of the Tao Te Ching — https://doi.org/10.1109/ialp68296.2024.11156623
[148] Voiceprint-Based Hardware Authentication System with Spoof Detection — https://doi.org/10.36948/ijfmr.2026.v08i01.66146
[149] From Social Engineering to Financial Fraud: A Multi-Agent LLM Approach to Vishing Detection — https://doi.org/10.1109/southeastcon63549.2026.11476161
[150] Deepfake Audio Detection via MFCC using Machine Learning — https://doi.org/10.1109/icoecit68303.2026.11496835
[152] Fake speech detection using VGGish with attention block — https://doi.org/10.1186/s13636-024-00348-4
[153] tinyDigiClones: A Multi-Modal LLM-Based Framework for Edge-optimized Personalized Avatars — https://doi.org/10.1109/ijcnn60899.2024.10649909
[155] MADD: A Multi-Lingual Multi-Speaker Audio Deepfake Detection Dataset — https://doi.org/10.1109/iscslp63861.2024.10800535
[157] Significance of Lower Frequency Regions for Audio Deepfake Detection — https://doi.org/10.1109/apsipaasc63619.2025.10849121
[158] Human Perception of Audio Deepfakes — https://doi.org/10.1145/3552466.3556531
[160] Talentiq: An Intelligent Resume Intelligence & Fair Candidate Ranking System — https://doi.org/10.55041/ijsmt.v2i4.024
[162] Detection of Speech Oriented Fraud Using Supervised Machine Learning Algorithms — https://doi.org/10.1109/ssitcon66133.2025.11342121
[163] Phonetic Analysis of Real and Synthetic Speech Using HuBERT Embeddings: Perspectives for Deepfake Detection — https://doi.org/10.1109/smc58881.2025.11343334
[164] Educational chatbot on Data Structures and Algorithms for the visually impaired — https://doi.org/10.1109/icnte56631.2023.10146702
[168] Designing a realistic peer-like embodied conversational agent for supporting children\textquotesingle s storytelling — https://doi.org/10.48550/arxiv.2304.09399
[170] FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework — https://doi.org/10.1109/ictai59109.2023.00136
[171] Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers — https://doi.org/10.1016/j.specom.2021.10.003
[174] Identity Manipulation: Responding to Advances in Artificial Intelligence and Robotics — https://doi.org/10.2139/ssrn.3772057
[177] From recorded to AI-generated instructional videos: A comparison of learning performance and experience — https://doi.org/10.1111/bjet.13530
[179] ENHANCING SHORT-TERM MEMORY IN CONSECUTIVE INTERPRETING TRAINING: EVALUATING AI-GENERATED SPEECH SIMULATIONS FOR ENGLISH MAJORS AT THE DIPLOMATIC ACADEMY OF VIETNAM — https://doi.org/10.63023/2525-2445/jfs.ulis.5530
[180] The impact of AI-generated videos on student learning and engagement in a virtual learning environment for online STEM courses — https://doi.org/10.1117/12.3057455
[181] User Perception and Awareness of AI-Generated Cyber Threats: A TAM-Based Study — https://doi.org/10.1109/bts-i2c67944.2025.11399306
[182] Using AI-Generated Audio and Dialogues to Improve Italian Listening Comprehension — https://doi.org/10.65266/qcly8894
[183] AI-generated clinical summaries: errors and susceptibility to speech and speaker variability — https://doi.org/10.1136/bmjhci-2025-101918
[185] An Efficient Approach for Detecting AI Generated Multimedia — https://doi.org/10.1109/acet67282.2025.11430354
[186] Look Who’s Talking Now: The Effects of Pre-recorded and AI-generated Synthetic Brand Voices on Brand Anthropomorphism and Brand Equity — https://doi.org/10.1177/09732586241253651
[187] Revolutionising patient care: the role of AI-generated avatars in healthcare consultations — https://doi.org/10.1093/eurheartj/ehae666.3492
[188] AI-Driven Detection of AI-Generated Cyber Attacks: A Framework for Defending Against Generative Adversarial Threats — https://doi.org/10.1109/icaic67076.2026.11395730
[189] “Robot Emotions Are Not Real!”: Future Factory Workers’ Perceptions, Attitudes, and Experience of Collaborative Robots, Conversational AIs, and AI-Empowered, Voice-Enabled Collaborative Robots — https://doi.org/10.1145/3779295
[191] Securing Voice-Based Financial Authentication in the Era of AI Voice Cloning: Challenges, Vulnerabilities, and Counter-Measures — https://doi.org/10.32996/jcsts.2025.7.4.60
[192] "AI Voice: A modern strategy in teaching literature to the students of Cabadbaran City National High School " — https://doi.org/10.69651/pijhss0403207
[193] A Pilot Test of an AI Voice-Driven Simulation With Feedback for Medical Students to Practice Discussing Diagnostic Mammogram Results With Patients — https://doi.org/10.7759/cureus.95606
[194] Voice‐over anatomy lectures created by AI‐voice cloning technology: A descriptive article — https://doi.org/10.1002/ase.2524
[196] Customizing Generated Signs and Voices of AI Avatars: Deaf-Centric Mixed-Reality Design for Deaf-Hearing Communication — https://doi.org/10.1145/3710953
[197] Artificial intelligence empowered voice generation for amyotrophic lateral sclerosis patients — https://doi.org/10.1038/s41598-024-84728-y
[198] From Voice to Choice: Understanding the Consumer Response to Voice Assistant and Human Recommendations — https://doi.org/10.1111/ijcs.70128
[199] Effectiveness of Al-Assisted Patient Health Education Using Voice Cloning and ChatGPT: Prospective Randomized Controlled Trial — https://doi.org/10.2196/81387
[200] Mimicking the Human Voice: Investigation of the Effectiveness of Medex and Izotope RX10 in Detecting Artificial Intelligence Voice Cloning — https://doi.org/10.1109/icecet63943.2025.11472401
[201] Digital Lyrebirds: Experimental Evidence That Voice-Based Deep Fakes Influence Trust — https://doi.org/10.1287/mnsc.2022.03316
[202] VoiceRadar: Voice Deepfake Detection using Micro-Frequency and Compositional Analysis — https://doi.org/10.14722/ndss.2025.243389
[203] ARTIFICIAL AUDIO: EMERGING USES OF AI IN PODCASTING — https://doi.org/10.5210/spir.v2024i0.14007
[204] Artificial Intelligence-Generated Podcasts for Wilderness Medicine Education: A Feasibility Study. — https://doi.org/10.1177/10806032261423072
[205] Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices — https://doi.org/10.48550/arxiv.2504.10650
[206] Using AI Voices to Enhance Accent Recognition Skills in EFL Learners: A Strategy for Better TOEIC Listening Outcomes — https://doi.org/10.1109/icalt64023.2025.00077
[207] An AI Powered Multilingual Guide for Tamil Nadu’s Heritage Tourism using Large Language Models — https://doi.org/10.38124/ijisrt/25may361
[208] Empathic extended reality in the era of generative AI — https://doi.org/10.70401/ec.2025.0009
[212] Voices of Deception - Detecting AI-Synthesized Speech Using Spectrotemporal Features and Hybrid Learning Models — https://doi.org/10.1109/indiscon66021.2025.11252158
[213] Generative AI and political simulacra: Hyperreality, consumption, and humorous relationships in users’ YouTube comments — https://doi.org/10.5210/fm.v31i3.14379
[215] Neuromarketing Research of AI Content: Case Studies from NEUROLAB at FMK UCM in Trnava — https://doi.org/10.34135/mmidentity-2024-08
[218] Employing AI-bots in media industry: the OFF Radio Kraków experiment case study — https://doi.org/10.5171/2024.4451224
[219] Language learning at the brink of singularity: AI's impact on educational paradigms — https://doi.org/10.29140/9780648184485-30
[220] The influence of generative AI on popular music: Fan productions and the reimagination of iconic voices — https://doi.org/10.1177/01634437241282382
[223] AI in Cybersecurity — https://doi.org/10.1109/ietc57902.2023.10152034
[225] Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings — https://doi.org/10.48550/arxiv.2305.05401
[226] AI based approach to trailer generation for online educational courses — https://doi.org/10.1007/s40012-023-00390-1
[233] Transformer-Based Named Entity Recognition for Automated Server Provisioning — https://doi.org/10.1109/icwr65219.2025.11006240
[236] Binarization in DeepFake Audio Detection: A Comparative Study and Performance Analysis — https://doi.org/10.1109/icsc64553.2025.10968243
[237] THE IMPACT OF TECHNOLOGY ON COPYRIGHT: COPYRIGHT DETERMINATION MECHANISM FOR SOUNDS PRODUCED BY ARTIFICIAL INTELLIGENCE — https://doi.org/10.61796/jaide.v2i3.1491
[240] Anomaly Detection and Localization for Speech Deepfakes via Feature Pyramid Matching — https://doi.org/10.48550/arxiv.2503.18032
[242] The Role of Artificial Intelligence in Personalizing the Process of Learning the English Language — https://doi.org/10.70315/uloap.ullli.2025.0201005
[243] A review of studies examining the effects of artificial intelligence technologies on the cognitive abilities of preschool children — https://doi.org/10.17072/2078-7898/2025-3-417-428
[246] A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions — https://doi.org/10.3390/a15050155
[247] Agriculture Talk BOT using AI — https://doi.org/10.35940/ijrte.b1037.0782s519
[248] Detecting Fake Audio of Arabic Speakers Using Self-Supervised Deep Learning — https://doi.org/10.1109/access.2023.3286864
[250] Mirror Neurons cannot be Fooled by Artificial Voices – a study with Implications for Education using Magnetic Resonance Imaging (MRI) and Convolutional Neural Network (CNN) — https://doi.org/10.46300/9109.2025.19.12
[253] ChatGPT Isn't Magic — https://doi.org/10.5204/mcj.3004
[254] Deepfake Audio Detection for Urdu Language Using Deep Neural Networks — https://doi.org/10.1109/access.2025.3571293
[255] Doctors’ perceptions of using their digital twins in patient care — https://doi.org/10.1038/s41598-023-48747-5
[256] An Exploratory Analysis of ChatGPT Compared to Human Performance With the Anesthesiology Oral Board Examination: Initial Insights and Implications — https://doi.org/10.1213/ane.0000000000006875
[257] Is this the real life? Investigating the credibility of synthesized faces and voices created by amateurs using artificial intelligence tools. — https://doi.org/10.1145/3604321.3604329
[258] Short-Term Perceptual Training Modulates Neural Responses to Deepfake Speech But Does Not Improve Behavioral Discrimination — https://doi.org/10.1523/eneuro.0300-25.2026
[264] Electromyography based Gesture Recognition: An Implementation of Hand Gesture Analysis Using Sensors — https://doi.org/10.33317/ssurj.424
[266] Everybody’s Song Making — https://doi.org/10.1080/13528165.2019.1594267
[269] EduBot — https://doi.org/10.62791/20510
[270] PASS: Presentation Automation for Slide Generation and Speech — https://doi.org/10.48550/arxiv.2501.06497

AI-Generated Voice, Synthetic Speech, and Voice Cloning: Scoping Review with ☸️SAIMSARA.

Digital Health

Issue 3, Volume 1, 2026

Clinical / practical impact

Evidence / detection frontier

Translation gaps / governance

Get access to the full paper