default search action

combined dblp search
author search
venue search
publication search

ask others

26th Interspeech 2025: Rotterdam, The Netherlands

> Home > Conferences and Workshops > Interspeech

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/2025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/2025
Odette Scharenborg, Catharine Oertel, Khiet Truong:
26th Annual Conference of the International Speech Communication Association, Interspeech 2025, Rotterdam, The Netherlands, 17-21 August 2025. ISCA 2025

Keynote 1 - Roger Moore: From Talking and Listening Devices to Intelligent Communicative Machines

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Moore25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Moore25
Roger K. Moore:
From Talking and Listening Devices to Intelligent Communicative Machines.

Spoken Machine Translation 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DucceschiF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DucceschiF25
Luca Ducceschi, Greta H. Franzini:
Speech transcription from South Tyrolean Dialect to Standard German with Whisper.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SubramanianCJBX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SubramanianCJBX25
Aswin Shanmugam Subramanian, Harveen Singh Chadha, Vikas Joshi, Shubham Bansal, Jian Xue, Rupeshkumar Mehta, Jinyu Li:
Length Aware Speech Translation for Video Dubbing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KumarA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarA25
Vishal Kumar, Vinayak Abrol:
ArticulateX: End-to-End Monolingual Speech Translation in Articulator Space.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OuZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OuZ25
Jiale Ou, Hongying Zan:
CMSP-ST: Cross-modal Mixup with Speech Purification for End-to-End Speech Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HiguchiOK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HiguchiOK25
Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi:
End-to-End Speech Translation Guided by Robust Translation Capability of Large Language Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PuLZY0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PuLZY0025
Yu Pu, Xiaoqian Liu, Guangyu Zhang, Zheng Yan, Wei-Qiang Zhang, Xie Chen:
Empowering Large Language Models for End-to-End Speech Translation Leveraging Synthetic Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GallegoPGTH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GallegoPGTH25
Gerard I. Gállego, Oriol Pareras, Martí Cortada Garcia, Lucas Takanori, Javier Hernando:
Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FutamiTKISA025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FutamiTKISA025
Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Yuki Ito, Hassan Shahmohammadi, Siddhant Arora, Shinji Watanabe:
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PothulaABDKV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PothulaABDKV25
Aishwarya Pothula, Bhavana Akkiraju, Srihari Bandarupalli, Charan Devarkonda, Santosh Kesiraju, Anil Kumar Vuppala:
End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangCX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangCX25
Shaowen Wang, Xinyuan Chen, Yao Xu:
Self-Improvement for Audio Large Language Model using Unlabeled Speech.

Real-time Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PeiSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PeiSS25
Yan Ru Pei, Ritik Shrivastava, Sidharth:
Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuHYWHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuHYWHL25
Shenghui Lu, Hukai Huang, Jinanglong Yao, Kaidi Wang, Qingyang Hong, Lin Li:
A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaYYZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaYYZ25
Teng Aleksandra Ma, Sile Yin, Li-Chia Yang, Shuo Zhang:
Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongKC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongKC25
Yonghun Song, Yeeun Kim, Yoonyoung Chung:
Lightweight Speech Enhancement Model Based on Harmonic Attention and Phase Estimation with Skin-Attachable Accelerometer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoCZYC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoCZYC25
Yi Gao, Hangting Chen, Siyu Zhang, Qingshan Yang, Jingcong Chen:
TSDT-Net: Ultra-Low-Complexity Two-Stage Model Combining Dual-Path-Transformer and Transform-Average-Concatenate Network for Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BN25
Chidambar B, Hanumanth Rao Naidu:
Structured Codebook Based Hierarchical Framework for DNN for Computationally Efficient Speech Enhancement.

Multilinguality, Cross-linguistic Studies, L2 Speech

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouH25
Qian Zhou, Mathilde Hutin:
Evaluation of Three Automatic Alignment Tools for the Processing of Non-native French.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuG25
Hongchen Wu, Yixin Gu:
CrossPhon: An Auto Phone Mapping Tool to Streamline Cross-language Modeling for Phone Alignment of Low-resource Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MagoshiSLK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MagoshiSLK25
Ryo Magoshi, Shinsuke Sakai, Jaeyoung Lee, Tatsuya Kawahara:
Multi-lingual and Zero-Shot Speech Recognition by Incorporating Classification of Language-Independent Articulatory Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HsuBIG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HsuBIG25
Haley Hsu, Dani Byrd, Khalil Iskarous, Louis Goldstein:
Instantaneous changes in acoustic signals reflect syllable progression and cross-linguistic syllable variation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BakkoucheP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BakkoucheP25
Linda Bakkouche, Brechtje Post:
Influence of Proficiency and L2 Experience on Dynamic Spectral Cue Utilization in L2 Vowel Perception and Production.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YazawaK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YazawaK25
Kakeru Yazawa, Takayuki Konishi:
A Bayesian Approach to L2 Fluency Ratings by Native and Nonnative Listeners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HamannA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HamannA25
Silke Hamann, Andrea Alicehajic:
Are loan sequences different from foreign sequences? A perception study with Japanese listeners on coronal obstruent - high front vowel sequences.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChanH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChanH25
Le Xuan Chan, Annika Heuser:
Relative cue weighting in multilingual stop voicing production.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WhitePC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WhitePC25
Hannah White, Joshua Penney, Felicity Cox:
Variability in Intervocalic /t/ and Community Diversity in Australian English.

Speech Emotion Recognition 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoteRRB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoteRRB25
Pravin Mote, Donita Robinson, Elizabeth Richerson, Carlos Busso:
Vector Quantized Cross-lingual Unsupervised Domain Adaptation for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanGABRB025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanGABRB025
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma:
HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FangSLCL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FangSLCL25
Shi-Xin Fang, Liang-Yeh Shen, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee:
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004RZN0SS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004RZN0SS25
Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Björn W. Schuller:
Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BijoyPGK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BijoyPGK25
Mehedi Hasan Bijoy, Dejan Porjazovski, Tamás Grósz, Mikko Kurimo:
Multi-Teacher Language-Aware Knowledge Distillation for Multilingual Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GongSDACSWH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GongSDACSWH25
Ziwei Gong, Pengyuan Shi, Kaan Donbekci, Lin Ai, Run Chen, David Sasu, Zehui Wu, Julia Hirschberg:
Learning More with Less: Self-Supervised Approaches forLow-Resource Speech Emotion Recognition.

Multimodal Resources

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RenLXGZCXFY025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RenLXGZCXFY025
Yong Ren, Chenxing Li, Le Xu, Hao Gu, Duzhen Zhang, Yujie Chen, Manjie Xu, Ruibo Fu, Shan Yang, Dong Yu:
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuLRCGFY025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuLRCGFY025
Le Xu, Chenxing Li, Yong Ren, Yujie Chen, Yu Gu, Ruibo Fu, Shan Yang, Dong Yu:
Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenNDL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenNDL25
Thai-Binh Nguyen, Thi Van Nguyen, Quoc Truong Do, Chi Mai Luong:
ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuLCSCS025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuLCSCS025
Yihan Wu, Yichen Lu, Yijing Chen, Jiaqi Song, William Chen, Ruihua Song, Shinji Watanabe:
GALAXY: A Large-Scale Open-Domain Dataset for Multimodal Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengCNMN0C25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengCNMN0C25
Yizhou Peng, Yi-Wen Chao, Dianwen Ng, Yukun Ma, Chongjia Ni, Bin Ma, Eng Siong Chng:
FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Inoue0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Inoue0025
Sho Inoue, Shuai Wang, Haizhou Li:
PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DongLL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DongLL25
Boya Dong, Wentao Lei, Li Liu:
FFD: Fine-Finger Diffusion Model for Music to Fine-grained Finger Dance Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuLRTFLY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuLRTFLY25
Manjie Xu, Chenxing Li, Yong Ren, Xinyi Tu, Ruibo Fu, Wei Liang, Dong Yu:
Towards Diverse and Efficient Audio Captioning via Diffusion Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SoferGC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SoferGC25
Amit Sofer, Yoav Goldman, Shlomo E. Chazan:
Pull It Together: Reducing the Modality Gap in Contrastive Learning.

Interpretability in Audio and Speech Technology

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YinXDBL0P25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YinXDBL0P25
Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Haohe Liu, Wenwu Wang, Mark D. Plumbley:
EnvSDD: Benchmarking Environmental Sound Deepfake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FucciGNCB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FucciGNCB25
Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli:
Echoes of Phonetics: Unveiling Relevant Acoustic Cues for ASR via Feature Attribution.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BolanosPMF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BolanosPMF25
Cecilia Bolaños, Leonardo Pepino, Martín Meza, Luciana Ferrer:
Benchmarking Time-localized Explanations for Audio Classification Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0003LRP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0003LRP25
Andrew Chang, Yike Li, Iran R. Roman, David Poeppel:
Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OndaIFSM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OndaIFSM25
Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu:
Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AshiharaDOMH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AshiharaDOMH25
Takanori Ashihara, Marc Delcroix, Tsubasa Ochiai, Kohei Matsuura, Shota Horiguchi:
Analysis of Semantic and Acoustic Token Variability Across Speech, Music, and Audio Domains.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GetmanGLK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GetmanGLK25
Yaroslav Getman, Tamás Grósz, Tommi Lehtonen, Mikko Kurimo:
Is your model big enough? Training and interpreting large-scale monolingual speech foundation models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PatakisLKDS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PatakisLKDS25
Andreas Patakis, Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou:
Semantic-Aware Interpretable Multimodal Music Auto-Tagging.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ErsoyMCADD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ErsoyMCADD25
Asim Ersoy, Basel Ahmad Mousi, Shammur Absar Chowdhury, Firoj Alam, Fahim Dalvi, Nadir Durrani:
From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengG025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengG025
Yen Meng, Sharon Goldwater, Hao Tang:
Effective Context in Neural Speech Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BentumBL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BentumBL25
Martijn Bentum, Louis ten Bosch, Tomas O. Lentz:
Word stress in self-supervised speech models: A cross-linguistic comparison.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KlootsMPSZB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KlootsMPSZB25
Marianne de Heer Kloots, Hosein Mohebbi, Charlotte Pouw, Gaofei Shen, Willem H. Zuidema, Martijn Bentum:
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuoD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuoD25
Robin Huo, Ewan Dunbar:
Iterative Refinement, Not Training Objective, Makes HuBERT Behave Differently from wav2vec 2.0.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShenMBAC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShenMBAC25
Gaofei Shen, Hosein Mohebbi, Arianna Bisazza, Afra Alishahi, Grzegorz Chrupala:
On the reliability of feature attribution methods for speech classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeschlyRNLHACGR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeschlyRNLHACGR25
Emma Cathrine Liisborg Leschly, Oliver Roesler, Michael Neumann, Jackson Liscombe, Abhishek Hosamath, Lakshmi Arbatti, Line H. Clemmensen, Melanie Ganz, Vikram Ramanarayanan:
An Exploration of Interpretable Deep Learning Models for the Assessment of Mild Cognitive Impairment.

Summarization

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FreisingerSRBR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FreisingerSRBR25
Steffen Freisinger, Philipp Seeberger, Thomas Ranzenberger, Tobias Bocklet, Korbinian Riedhammer:
Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanoODFC025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanoODFC025
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Ryo Fukuda, William Chen, Shinji Watanabe:
Pick and Summarize: Integrating Extractive and Abstractive Speech Summarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IstaitehME25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IstaitehME25
Othman Istaiteh, Salima Mdhaffar, Yannick Estève:
Beyond Similarity Scoring: Detecting Entailment and Contradiction in Multilingual and Multimodal Contexts.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GongADJPWEH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GongADJPWEH25
Ziwei Gong, Lin Ai, Harsh Deshpande, Alexander Johnson, Emmy Phung, Zehui Wu, Ahmad Emami, Julia Hirschberg:
Comparison-Based Automatic Evaluation for Meeting Summarization.

Show and Tell 1: ASR / Tools

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/0005MD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0005MD25
Alessandro De Luca, Srikanth Madikeri, Volker Dellwo:
Voxplorer: Voice data exploration and projection in an interactive dashboard.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/RaiRA025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RaiRA025
Anand Kumar Rai, Satyam Rahangdale, Utkarsh Anand, Animesh Mukherjee:
ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/DraxlerPHAH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DraxlerPHAH25
Christoph Draxler, Julian Pömp, Henk van den Heuvel, Fabio Ardolino, Arjan van Hessen:
Transcribing Oral History Recordings Using the Transcription Portal.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/VukovicZSMRMGB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VukovicZSMRMGB25
Teodora Vukovic, Jérémy Zehr, Jonathan Schaber, Igor Mustac, Nikolina Rajovic, Daniel McDonald, Johannes Graën, Noah Bubenhofer:
LiRI Corpus Platform: Demonstration of a Web-Based Infrastructure for Multimodal Corpus Analysis.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/LiWGDY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWGDY25
Zirong Li, Hongchen Wu, Yixin Gu, Yao Du, Yang Yue:
Speech Annotation for A: Accuracy, Access, and Application.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ZnotinsGG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZnotinsGG25
Arturs Znotins, Didzis Gosko, Normunds Gruzitis:
LATE: Open Source Toolkit for Latvian and Latgalian Speech Transcription.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/NethilMAM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NethilMAM25
Kumarmanas Nethil, Vaibhav Mishra, Kriti Anandan, Kavya Manohar:
Scalable Offline ASR for Command-Style Dictation in Courtrooms.

Models of Speech Production

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuIG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuIG25
Yijing Lu, Khalil Iskarous, Louis Goldstein:
Towards a dynamical model of transitions between fluent and stuttered speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DindartRLBLPTF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DindartRLBLPTF25
Juliette Dindart, Agnès Rouxel, Crystal Lin, Trung Kien Bui, Muriel Lefort, Claire Pillot-Loiseau, Christophe Trésallet, Frédérique Frouin:
Study of vocal fold vibration using M-mode ultrasound: a proof of concept.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeHAPMLSGN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeHAPMLSGN25
Jihwan Lee, Kevin Huang, Kleanthis Avramidis, Simon Pistrosch, Monica González Machorro, Yoonjeong Lee, Björn W. Schuller, Louis Goldstein, Shrikanth Narayanan:
Articulatory Feature Prediction from Surface EMG during Speech Production.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TabatabaeeBOTE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TabatabaeeBOTE25
Saba Tabatabaee, Suzanne Boyce, Liran Oren, Mark Tiede, Carol Y. Espy-Wilson:
Enhancing Acoustic-to-Articulatory Speech Inversion by Incorporating Nasality.

Speech and Grammar/Articulatory Analyses

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SteinT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SteinT25
Anna Stein, Kevin Tang:
Modeling Probabilistic Reduction using Information Theory and Naive Discriminative Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaliszFK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaliszFK25
Zofia Malisz, Jan Foremski, Malgorzata Kul:
Contextual predictability effects on acoustic distinctiveness in read Polish speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuenDS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuenDS25
Ivan Yuen, Katherine Demuth, Stefanie Shattuck-Hufnagel:
How do both phonological and syntactic complexity influence speech planning?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002H25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002H25
Anqi Xu, Yu-Yin Hsu:
When focus shapes the flow: prosodic restructuring in Mandarin complex nominals.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kakouros25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kakouros25
Sofoklis Kakouros:
Investigating the Impact of Word Informativeness on Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShaoBHG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShaoBHG25
Bowei Shao, Philipp Buech, Anne Hermes, Maria Giavazzi:
Lexical stress affects lenition: The case of Italian palato-alveolar affricates.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BirkholzZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BirkholzZ25
Peter Birkholz, Tianyi Zhang:
Evaluation of a model for sound radiation from the vocal tract wall.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HopponenKNHTM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HopponenKNHTM25
Satu Hopponen, Tomi Kinnunen, Alexandre Nikolaev, Rosa González Hautamäki, Lauri Tavi, Einar Meister:
FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents.

Speaking Styles, Register and Conversational Speech

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiangS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiangS25
Yunzhuo Xiang, Jingyi Sun:
Modeling Formant Dynamics in Mandarin /ai/: Effects of Speech Style and Speech Rate.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QianFS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QianFS25
Livia Qian, Carol Figueroa, Gabriel Skantze:
Representation of Perceived Prosodic Similarity of Conversational Feedback.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiculescuV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiculescuV25
Oana Niculescu, Monica Vasileanu:
Prolongation in Romanian.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BodurFM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BodurFM25
Kübra Bodur, Corinne Fredouille, Christine Meunier:
Speech Reduction in French: The Relationship Between Vowel Space and Articulation Dynamics.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Batchelder-Schwab25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Batchelder-Schwab25
Andre Batchelder-Schwab, Vasileios Michos, Jonathan Barnes:
Stress in Spoken and Whistled Greek.

Emotional Distress in Speech

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KrzywdziakESSP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KrzywdziakESSP25
Justyna Krzywdziak, Bartlomiej Eljasiak, Joanna Stepien, Michal Swiatek, Agnieszka Pruszek:
Leveraging Text and Speech Processing for Suicide Risk Classification in Chinese Adolescents.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuCLDQW0CZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuCLDQW0CZ25
Wen Wu, Ziyun Cui, Chang Lei, Yinan Duan, Diyang Qu, Ji Wu, Bowen Zhou, Runsen Chen, Chao Zhang:
The 1st SpeechWellness Challenge: Detecting Suicide Risk Among Adolescents.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoFGL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoFGL25
Yifan Gao, Jiao Fu, Long Guo, Hong Liu:
Leveraging Large Language Models for Spontaneous Speech-Based Suicide Risk Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenYTLQL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenYTLQL25
Xi Chen, Renzhe Yu, Yanshen Tan, Yiyi Li, Quan Qian, Ying Lin:
Predicting Adolescent Suicidal Risk from Multi-task-based Speech: An Ensemble Learning Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoquefortDR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoquefortDR25
Filomene Roquefort, Alexandre Ducorroy, Rachid Riad:
In-context learning capabilities of Large Language Models to detect suicide risk among adolescents from speech transcripts.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimOYYKLLY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimOYYKLLY25
June-Woo Kim, Wonkyo Oh, Haram Yoon, Sung-Hoon Yoon, Dae-Jin Kim, Dong-Ho Lee, Sang-Yeol Lee, Chan-Mo Yang:
Language-Agnostic Suicidal Risk Detection Using Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MartinBAMR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MartinBAMR25
Vincent P. Martin, Charles Brazier, Maxime Amblard, Michel Musiol, Jean-Luc Rouas:
Network of acoustic characteristics for the automatic detection of suicide risk from speech. Contribution to the 2025 SpeechWellness challenge by the Semawave team.

Prosody in Speech Synthesis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ErenLK0A25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ErenLK0A25
Eray Eren, Qingju Liu, Hyeongwoo Kim, Pablo Garrido, Abeer Alwan:
ProMode: A Speech Prosody Model Conditioned on Acoustic and Textual Inputs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeSJHC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeSJHC25
Kyowoon Lee, Artyom Stitsyuk, Gunu Jho, Inchul Hwang, Jaesik Choi:
Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MayerLMEVVV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MayerLMEVVV25
Paul Mayer, Florian Lux, Alejandro Pérez González de Martos, Angelina Elizarova, Lindsey Vanderlyn, Dirk Väth, Ngoc Thang Vu:
Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OguraOOCTK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OguraOOCTK25
Tadashi Ogura, Takuma Okamoto, Yamato Ohtani, Erica Cooper, Tomoki Toda, Hisashi Kawai:
GST-BERT-TTS: Prosody Prediction Without Accentual Labels For Multi-Speaker TTS Using BERT With Global Style Tokens.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MondalSVKY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MondalSVKY25
Anindita Mondal, Monica Surtani, Anil Kumar Vuppala, Parameswari Krishnamurthy, Chiranjeevi Yarra:
ExagTTS: An Approach Towards Controllable Word Stress Incorporated TTS for Exaggerated Synthesized Speech Aiding Second Language Learners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeSKK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeSKK25
Hoyeon Lee, Sejung Son, Ye-Eun Kang, Jong-Hwan Kim:
Synthetic Data Generation for Phrase Break Prediction with Large Language Model.

Depression Detection and Assessment 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WhiteCDBCMOFFNG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WhiteCDBCMOFFNG25
Lauren L. White, Ewan Carr, Judith Dineley, Catarina Botelho, Pauline Conde, Faith Matcham, Carolin Oetzmann, Amos Folarin, George Fairs, Agnes Norbury, Stefano Goria, Srinivasan Vairavan, Til Wykes, Richard J. B. Dobson, Vaibhav Naraya, Matthew Hotopf, Alberto Abad, Isabel Trancoso, Nicholas Cummins:
Speech Reference Intervals: An Assessment of Feasibility in Depression Symptom Severity Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangZZMZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangZZMZ25
Wenrui Liang, Rong Zhang, Xuezhen Zhang, Ying Ma, Wei-Qiang Zhang:
DepressGEN: Synthetic Data Generation Framework for Depression Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouZYLHZSWY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouZYLHZSWY25
Yuqiu Zhou, Yongjie Zhou, Yudong Yang, Yang Liu, Jun Huang, Shuzhi Zhao, Rongfeng Su, Lan Wang, Nan Yan:
Emotion-Guided Graph Attention Networks for Speech-Based Depression Detection under Emotion-Inducting Tasks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Prakrankamanant25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Prakrankamanant25
Patawee Prakrankamanant, Shinji Watanabe, Ekapol Chuangsuwanich:
Explainable Depression Detection using Masked Hard Instance Mining.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DumpalaSUO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DumpalaSUO25
Sri Harsha Dumpala, Chandramouli Shama Sastry, Rudolf Uher, Sageev Oore:
Test-Time Training for Speech-based Depression Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZuoM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZuoM25
Lishi Zuo, Man-Wai Mak:
Leveraging Ordinal Information for Speech-based Depression Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LoweimiGL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LoweimiGL25
Erfan Loweimi, Sofia de la Fuente Garcia, Saturnino Luz:
Zero-Shot Speech-Based Depression and Anxiety Assessment with LLMs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LewisFJCKJMQCPP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LewisFJCKJMQCPP25
Robert Lewis, Szymon Fedor, Nelson Hidalgo Julia, Joshua Curtiss, Jiyeon Kim, Noah Jones, David Mischoulon, Thomas F. Quatieri, Nicholas Cummins, Paola Pedrelli, Rosalind W. Picard:
Towards the Objective Characterisation of Major Depressive Disorder Using Speech Data from a 12-week Observational Study with Daily Measurements.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoungTMPRC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoungTMPRC25
Sophie Young, Fuxiang Tao, Bahman Mirheidari, Madhurananda Pahar, Markus Reuber, Heidi Christensen:
Can Speech Accurately Detect Depression in Patients With Comorbid Dementia? An Approach for Mitigating Confounding Effects of Depression and Dementia.

Speech Analysis, Detection and Classification 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiSH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiSH25
Shaojie Li, Qintuya Si, De Hu:
Temporal Convolutional Network with Smoothed and Weighted Losses for Distant Voice Activity and Overlapped Speech Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TripathiKW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TripathiKW25
Kumud Tripathi, Chowdam Venkata Kumar, Pankaj Wasnik:
Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimUCK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimUCK25
Miseul Kim, Seyun Um, Hyeonjin Cha, Hong-Goo Kang:
SpeechMLC: Speech Multi-label Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimH25
Dohyun Kim, Jiwook Hwang:
Fully End-to-end Streaming Open-vocabulary Keyword Spotting with W-CTC Forced Alignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiRPTC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiRPTC25
Anna Seo Gyeong Choi, Alexander Richardson, Ryan Partlan, Sunny X. Tang, Sunghye Cho:
Comparative Evaluation of Acoustic Feature Extraction Tools for Clinical Speech Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Chowdhury025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chowdhury025
Tahiya Chowdhury, Verónica Romero:
Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling.

Speech-based Cognitive Assessment 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YongKLLB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YongKLLB25
Vi Jun Sean Yong, Serkan Kumyol, Pau Le Lisa Low, Winnie Suk Wai Leung, Tristan Braud:
HK-GenSpeech: A Generative AI Scene Creation Framework for Speech Based Cognitive Assessment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GogoiSKSKMSSP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GogoiSKSKMSSP25
Parismita Gogoi, Vishwanath Pratap Singh, Seema Khadirnaikar, Soma Siddhartha, Sishir Kalita, Jagabandhu Mishra, Md. Sahidullah, Priyankoo Sarmah, S. R. M. Prasanna:
Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLFHCWCPYL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLFHCWCPYL25
Yin-Long Liu, Yuanchao Li, Rui Feng, Liu He, Jia-Xin Chen, Yi-Ming Wang, Yu-Ang Chen, Yan-Han Peng, Jia-Hong Yuan, Zhen-Hua Ling:
Leveraging Cascaded Binary Classification and Multimodal Fusion for Dementia Detection through Spontaneous Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiaLLZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiaLLZ25
Kaichen Jia, Jinpeng Li, Ke Li, Wei-Qiang Zhang:
Whisper-Based Multilingual Alzheimer's Disease Detection and Improvements for Low-Resource Language.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunQPLCZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunQPLCZ25
Qi Sun, Ziyue Qiu, Yu Pu, Jinpeng Li, Xuchu Chen, Wei-Qiang Zhang:
PPGs-BERT: Leveraging Phoneme Sequence and BERT for Alzheimer's Disease Detection from Spontaneous Speech.

Large Language Models in Speech Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaBYHO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaBYHO25
Te Ma, Min Bi, Saierdaer Yusuyin, Hao Huang, Zhijian Ou:
LLM-based phoneme-to-grapheme for phoneme-based speech recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengjieC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengjieC25
Jie Zhengjie, Gaofeng Cheng:
Pinyin-Guided Chinese Speech Recognition with Large Language Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuKF025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuKF025
Hang Su, Yuxiang Kong, Lichun Fan, Jian Luan:
Text-Enhanced Audio Encoder for Large Language Model based Speech Recognition via Cross-Modality Pre-training with Unpaired Audio-Text Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangM25
Jinda Zhang, Aanchan Mohan:
Towards atypical speech transcription using LLM-based ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PrakashKHSGCVS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrakashKHSGCVS25
Jeena Prakash, Blessingh Kumar, Kadri Hacioglu, Bidisha Sharma, Sindhuja Gopalan, Malolan Chetlur, Shankar Venkatesan, Andreas Stolcke:
Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Xu00000LL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Xu00000LL025
Tianyi Xu, Hongjie Chen, Qing Wang, Hang Lv, Jian Kang, Jie Li, Zhennan Lin, Yongxiang Li, Lei Xie:
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis.

Speech Coding and Echo Cancellation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GanL0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GanL0025
Shanhui Gan, Zijian Liang, Kai Niu, Ping Zhang:
Synonymity-Based Semantic Coding for Efficient Speech Compression.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimLJK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimLJK25
Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang:
Towards an Ultra-Low-Delay Neural Audio Coding with Computational Efficiency.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WanZHW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WanZHW25
Zixiang Wan, Guochang Zhang, Yifeng He, Jianqiang Wei:
SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuKE025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuKE025
Haibin Wu, Naoyuki Kanda, Sefik Emre Eskimez, Jinyu Li:
TS3-Codec: Transformer-Based Simple Streaming Single Codec.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaeL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaeL25
Yunkee Chae, Kyogu Lee:
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangMMM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangMMM25
Bowen Zhang, Ian McLoughlin, Xiaoxiao Miao, A. S. Madhukumar:
LSPnet: an ultra-low bitrate hybrid neural codec.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoAZDJL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoAZDJL25
Yao Guo, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Xiao-Hang Jiang, Zhen-Hua Ling:
Vision-Integrated High-Quality Neural Speech Coding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiKLJK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiKLJK25
Woongjib Choi, Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang:
Neural Spectral Band Generation for Audio Coding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhao0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhao0025
Fei Zhao, Xueliang Zhang, Zhong-Qiu Wang:
Multi-Channel Acoustic Echo Cancellation Based on Direction-of-Arrival Estimation.

Decoding Algorithms

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OkabeY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OkabeY25
Koji Okabe, Hitoshi Yamamoto:
Simultaneous Masked and Unmasked Decoding with Speculative Decoding Masking for Fast ASR without Accuracy Loss.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuBGG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuBGG25
Hainan Xu, Vladimir Bataev, Lilit Grigoryan, Boris Ginsburg:
WIND: Accelerated RNN-T Decoding with Windowed Inference for Non-blank Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BataevAGLLG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BataevAGLLG25
Vladimir Bataev, Andrei Andrusenko, Lilit Grigoryan, Aleksandr Laptev, Vitaly Lavrukhin, Boris Ginsburg:
NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GrigoryanBAXLG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GrigoryanBAXLG25
Lilit Grigoryan, Vladimir Bataev, Andrei Andrusenko, Hainan Xu, Vitaly Lavrukhin, Boris Ginsburg:
Pushing the Limits of Beam Search Decoding for Transducer-based ASR models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MittalPSJ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MittalPSJ25
Ashish R. Mittal, Darshan Prabhu, Sunita Sarawagi, Preethi Jyothi:
Skip-Salsa: Skip Synchronous Fusion of ASR LLM Decoders.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuenY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuenY25
Kwok Chin Yuen, Jia Qi Yip:
Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition.

Queer and Trans Speech Science and Technology

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/McAllisterESTHP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/McAllisterESTHP25
Tara McAllister, Collin Eagen, Yi Shan, Peter Traver, Daphna Harel, Tae Hong Park, Vesna D. Novak:
Web-Based Application for Real-Time Biofeedback of Vocal Resonance in Gender-Affirming Voice Training: Design and Usability Evaluation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NetzorgCGWFGJA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NetzorgCGWFGJA25
Robin Netzorg, Naomi Carvalho, Andrea Guzman, Lydia Wang, Juliana Francis, Klo Vivienne Garoute, Keith Johnson, Gopala Anumanchipalli:
On the Production and Perception of a Single Speaker's Gender.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RossHUL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RossHUL25
Alice Ross, Cliodhna Hughes, Eddie L. Ungless, Catherine Lai:
Conveying Gender Through Speech: Insights from Trans Men.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiegertMG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiegertMG25
Ingo Siegert, Jan Marquenie, Sven Grawunder:
Queer Waves: A German Speech Dataset Capturing Gender and Sexual Diversity from Podcasts and YouTube.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Hartmann25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Hartmann25
Carlos Hartmann:
Reddit FlairShare: A Human-Annotated Dataset of Gender-Progressive Online Discourse.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HopeS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HopeS25
Maxwell Hope, Éva Székely:
Voices of 'cyborg awesomeness': Posthuman embodiment of nonbinary gender expression in AI speech technologies.

Tone

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DongLLNS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DongLLNS25
Xiao Dong, Fengming Liu, Chien-Jer Charles Lin, Monica Nesbitt, Shuju Shi:
Neutral Tone Variation in Beijing Mandarin: Is Neutral Tone Toneless?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuFX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuFX25
Siqi Lu, Hui Feng, Ziyu Xiong:
The Role of Syntactic Structures in Shaping Directionality in Trisyllabic Tone Sandhi: Evidence from Tianjin Mandarin.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiF25
Zhijie Li, Hui Feng:
Acoustic Representation and Realization of Weak Elements Subcategories: In the Case of Tianjin Mandarin.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZX25
Lishan Li, Yaolin Zhou, Xiaoying Xu:
Lexical competition in the process of Cantonese tone merging: Diverse Impact Mechanisms Across Different Individuals and Tone Pairs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangH25
Zhenrui Zhang, Fang Hu:
Tonal Perception in Changde Mandarin.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuH25
Changhong Du, Fang Hu:
Tonal Contrasts in the Malipo Variety of the Mienic Language.

Cross-Lingual and Multilingual Processing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MarmorLSM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MarmorLSM25
Yanir Marmor, Yair Lifshitz, Yoad Snapir, Kinneret Misgav:
Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KlejchL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KlejchL025
Ondrej Klejch, William Lamb, Peter Bell:
A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HameedAHS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HameedAHS25
Razhan Hameed, Sina Ahmadi, Hanah Hadi, Rico Sennrich:
Automatic Speech Recognition for Low-Resourced Middle Eastern Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiN25
Zhaolin Li, Jan Niehues:
In-context Language Learning for Endangered Languages in Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YanHSLCITHPCKAP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YanHSLCITHPCKAP25
Brian Yan, Injy Hamed, Shuichiro Shimizu, Vasista Sai Lodagala, William Chen, Olga Iakovenko, Bashar Talafha, Amir Hussein, Alexander Polok, Kalvin Chang, Dominik Klement, Sara Althubaiti, Puyuan Peng, Matthew Wiesner, Thamar Solorio, Ahmed Ali, Sanjeev Khudanpur, Shinji Watanabe:
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShaoZWMLYLXX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShaoZWMLYLXX25
Mingchen Shao, Xinfa Zhu, Chengyou Wang, Bingshen Mu, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie:
Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenT25
Tuan Nguyen, Huy Dat Tran:
Can we train ASR systems on Code-switch without real code-switch data? Case study for Singapore's languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VesterbackaRKST25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VesterbackaRKST25
Leonora Vesterbacka, Faton Rekathati, Robin Kurtz, Justyna Sikora, Agnes Toftgård:
Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition.

Echo Cancellation, Feedback Control, and Near-end Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoHZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoHZ25
Fei Zhao, Shulin He, Xueliang Zhang:
Room Impulse Response as a Prompt for Acoustic Echo Cancellation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLL0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLL0025
Yuyang Wang, Yonghui Liu, Jianbing Liu, Kai Niu, Zhiqiang He:
CAGCRN: Real-Time Speech Enhancement with a Lightweight Model for Joint Acoustic Echo Cancellation and Noise Suppression.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWLLSLY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWLLSLY25
Jinfu Wang, Ziteng Wang, Xin Liu, Yang Liu, Qing Shi, Zhengqiang Luo, Feiran Yang:
Exploiting Echo Path Priors for Enhanced Stereo Acoustic Echo Cancellation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DinhKHY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DinhKHY25
Quang Minh Dinh, Hoda Rezaee Kaviani, Mehrdad Hosseinzadeh, Yuanhao Yu:
Extended Loss: Incorporating Long Context into Training Models when using Short Audio Frames.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VillaniCTO025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VillaniCTO025
Filippo Villani, Wai-Yip Chan, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen:
Analysis and Extension of a Near-End Listening Enhancement Method Based on Long-Term Fractile Noise Statistics.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuOPXLL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuOPXLL025
Yuan-Kuei Wu, Juan Azcarreta Ortiz, Kashyap Patel, Buye Xu, Jung-Suk Lee, Sanha Lee, Ashutosh Pandey:
A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LayMG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LayMG25
Bunlong Lay, Rostilav Makarov, Timo Gerkmann:
Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency.

Pathological Speech Analysis 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLYWY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLYWY25
Xiaokang Liu, Xingfeng Li, Yudong Yang, Lan Wang, Nan Yan:
Addressing Task Conflicts in Stuttering Detection via MMoE-Based Multi-Task Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VishwanathBGUKK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VishwanathBGUKK25
Y. S. Upendra Vishwanath, Tanuka Bhattacharjee, Deekshitha G, Sathvik Udupa, Chowdam Venkata Thirumala Kumar, Madassu Keerthipriya, Darshan Chikktimmegowda, Dipti Baskar, Yamini Belur, Seena Vengalil, Atchayaram Nalini, Prasanta Kumar Ghosh:
Comparison of Acoustic and Textual Features for Dysarthria Severity Classification in Amyotrophic Lateral Sclerosis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhoshJPS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhoshJPS25
Suhita Ghosh, Mélanie Jouaiti, Jan-Ole Perschewski, Sebastian Stober:
StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SanguedolceGGDG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SanguedolceGGDG25
Giulia Sanguedolce, Jón Guðnason, Dragos-Cristian Gruia, Emilie D'Olne, Fatemeh Geranmayeh, Patrick A. Naylor:
Physiologically-Informed Feature Analysis of Acquired Speech Disorders for Stroke Assessment.

Hearing Disorders

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Araiza-IllanMMB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Araiza-IllanMMB25
Gloria Araiza-Illan, Luke Meyer, Bert Maat, Deniz Baskent:
Robot-assisted Recognition of Vocal Emotions in Pseudospeech for Cochlear Implanted Adolescents.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CheemaP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CheemaP25
Ahsan J. Cheema, Sunil Puria:
Using Neurogram Similarity Index Measure (NSIM) to Model Hearing Loss and Cochlear Neural Degeneration.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JinMSK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinMSK25
Longbin Jin, Donghun Min, Jung Eun Shin, Eun Yi Kim:
Contrastive Learning-based Syllable-Level Mispronunciation Detection and Diagnosis for Speech Audiometry.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChiangH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChiangH25
Hsin-Tien Chiang, John H. L. Hansen:
A Deformable Convolution GAN Approach for Speech Dereverberation in Cochlear Implant Users.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaoMZLZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaoMZLZ25
Fengyuan Hao, Brian C. J. Moore, Huiyong Zhang, Xiaodong Li, Chengshi Zheng:
L3C-DeepMFC: Low-Latency Low-Complexity Deep Marginal Feedback Cancellation with Closed-Loop Fine Tuning for Hearing Aids.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangDS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangDS25
Man Wang, Yixin Ding, Niels O. Schiller:
Semantic Processing During Spoken Word Production by Children with Cochlear Implants.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DingWC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DingWC25
Yuting Ding, Xuefei Wang, Fei Chen:
Linguistic Masking and Its Release in Simulated Electric-acoustic Hearing.

Interspeech 2025 URGENT Challenge

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangSCSLN0SWF025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangSCSLN0SWF025
Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian:
Lessons Learned from the URGENT 2024 Speech Enhancement Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaijoZCSLN0SF0F25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaijoZCSLN0SF0F25
Kohei Saijo, Wangyou Zhang, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Yihui Fu, Wei Wang, Tim Fingscheidt, Shinji Watanabe:
Interspeech 2025 URGENT Speech Enhancement Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RongWHWHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RongWHWHL25
Xiaobin Rong, Dahan Wang, Qinwen Hu, Yushi Wang, Yuxiang Hu, Jing Lu:
TS-URGENet: A Three-stage Universal Robust and Generalizable Speech Enhancement Network.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeCSXH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeCSXH25
Xiaohuai Le, Zhuangqi Chen, Siyu Sun, Xianjun Xia, Chuanzeng Huang:
Multistage Universal Speech Enhancement System for URGENT Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunLLC0ZZY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunLLC0ZZY25
Zhihang Sun, Andong Li, Tong Lei, Rilin Chen, Meng Yu, Chengshi Zheng, Yi Zhou, Dong Yu:
Scaling beyond Denoising: Submitted System and Findings in URGENT Challenge 2025.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SerbestSCH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SerbestSCH25
Sanberk Serbest, Tijana Stojkovic, Milos Cernak, Andrew Harper:
DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoswamiH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoswamiH25
Nabarun Goswami, Tatsuya Harada:
FUSE: Universal Speech Enhancement using Multi-Stage Fusion of Sparse Compression and Token Generation Models for the URGENT 2025 Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaoNWJF025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaoNWJF025
Rong Chao, Rauf Nasretdinov, Yu-Chiang Frank Wang, Ante Jukic, Szu-Wei Fu, Yu Tsao:
Universal Speech Enhancement with Regression and Generative Mamba.

Spoken Machine Translation 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouasBL0PAA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouasBL0PAA25
Jean-Luc Rouas, Charles Brazier, Leila Ben Letaifa, Rafael Medina, Pedro Palacios, David Atienza, Giovanni Ansaloni:
Structured pruning for efficient systolic array accelerated cascade Speech-to-Text Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MohammadAminiST25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MohammadAminiST25
Mohammad MohammadAmini, Aghilas Sini, Marie Tahon, Antoine Laurent:
Scaling pseudo-labeling data for end-to-end low-resource speech translation (the case of Kurdish language).
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RKNJ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RKNJ25
Kirandevraj R, Vinod K. Kurmi, Vinay P. Namboodiri, C. V. Jawahar:
Multilingual Query-by-Example KWS for Indian Languages using Transliteration.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeXLWSMQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeXLWSMQ25
Chenyang Le, Yinfeng Xia, Huiyan Li, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian:
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BlaschkeWFWP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BlaschkeWFWP25
Verena Blaschke, Miriam Winkler, Constantin Förster, Gabriele Wenger-Glemser, Barbara Plank:
A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JavedBK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JavedBK25
Tahir Javed, Kaushal Santosh Bhogale, Mitesh M. Khapra:
NIRANTAR: Continual Learning with New Languages and Domains on Real-world Speech Data.

Spatial Audio and Acoustics 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LyuYW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LyuYW25
Sheng Lyu, Yuemin Yu, Chenshu Wu:
Temporal Modeling of Room Impulse Response Generation via Multi-Scale Autoregressive Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangJLHH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangJLHH25
Yunqi C. Zhang, Dhruv Jagmohan, Hong Kit Li, C. T. Justine Hui, Yusuke Hioka:
Effect of Noise Floor in Room Impulse Response on Speech Perception Under Spherical Harmonics-based Spatial Sound Reproduction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IckWMGR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IckWMGR25
Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Jonathan Le Roux:
Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuLLYW0K25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuLLYW0K25
Linya Fu, Yu Liu, Zhijie Liu, Zedong Yang, Zhong-Qiu Wang, Youfu Li, He Kong:
AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenSEH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenSEH25
Tuochao Chen, D. Shin, Hakan Erdogan, Sinan Hersek:
SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoD25
Yang Xiao, Rohan Kumar Das:
TF-Mamba: A Time-Frequency Network for Sound Source Localization.

Articulatory and Vocal Tract Modelling

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Berthommier25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Berthommier25
Frédéric Berthommier:
Articulatory modeling of the S-shaped F2 trajectories observed in Öhman's spectrographic analysis of VCV syllables.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VurmaMMRRKD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VurmaMMRRKD25
Allan Vurma, Einar Meister, Lya Meister, Jaan Ross, Marju Raju, Veeda Kala, Tuuri Dede:
The Role of Voiced Consonant Duration in Sung Vowel-Consonant and Consonant-Vowel Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FunkWS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FunkWS25
Riccarda Funk, Melanie Weirich, Adrian P. Simpson:
How sibilant spectra shape gender perception in prepubertal children: A voice morphing study.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PiyadasaGGPBSSF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PiyadasaGGPBSSF25
Tharinda Piyadasa, Joan Glaunès, Amelia Gully, Michael Proctor, Kirrie J. Ballard, Tünde Szalay, Naeim Sanaei, Sheryl Foster, David Waddington, Craig T. Jin:
Constrained LDDMM for Dynamic Vocal Tract Morphing: Integrating Volumetric and Real-Time MRI.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuMF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuMF25
Rongshuai Wu, Debasish Ray Mohapatra, Sidney Fels:
2D Immersed Boundary Method in Vocal Tract Acoustics: An Eulerian-Lagrangian Model for Simulation of Diphthongs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AzzouzVL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AzzouzVL25
Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie:
Reconstruction of the Complete Vocal Tract Contour Through Acoustic to Articulatory Inversion Using Real-Time MRI Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangKTZSHLHNNG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangKTZSHLHNNG25
Yubin Zhang, Prakash Kumar, Ye Tian, Ziwei Zhao, Xuan Shi, Kevin Huang, Kevin Lee, Haley Hsu, Shrikanth Narayanan, Krishna S. Nayak, Louis Goldstein:
Co-registration of real-time MRI and respiration for speech research.

Acoustic Assessment of Respiratory Health

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BemmelRBHFHUSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BemmelRBHFHUSS25
Loes van Bemmel, Lauren G. Reinders, Folkert Brijker, Bas Holverda, Frits M. E. Franssen, Hanneke van Helvoort, Visara Urovi, Marieke Spreeuwenberg, Sami O. Simons:
SPEAKtoCOPD: a flashmob study to collect COPD speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YanSU25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YanSU25
Yuyang Yan, Sami O. Simons, Visara Urovi:
Developing a LeFF Transformer Model for Exacerbated Speech Detection in COPD and Asthma.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiizumiTYNOH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiizumiTYNOH25
Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda, Binh Thien Nguyen, Yasunori Ohishi, Noboru Harada:
Towards Pre-training an Effective Respiratory Audio Foundation Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ReindersBMNFGSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ReindersBMNFGSS25
Lauren G. Reinders, Loes van Bemmel, Alexander Mackay, David Nobbs, Frits M. E. Franssen, Hester Gietema, Simona Schäfer, Sami O. Simons:
Effect of physical exercise on voice in people living with COPD.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DongZSZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DongZSZ25
Gaoyang Dong, Zhicheng Zhang, Ping Sun, Minghui Zhang:
Adaptive Differential Denoising for Respiratory Sounds Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiML25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiML25
Peidong Wei, Shiyu Miao, Lin Li:
Disentangling Dual-Encoder Masked Autoencoder for Respiratory Sound Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JeongK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JeongK25
Seung Gyu Jeong, Seong Eun Kim:
Patient-Aware Feature Alignment for Robust Lung Sound Classification: Cohesion-Separation and Global Alignment Losses.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ToikkanenK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ToikkanenK25
Miika Toikkanen, June-Woo Kim:
Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles.

Advances in Modelling and Imaging

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuillaumeBD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuillaumeBD25
Mélen Guillaume, Anahita Basirat, Julien Diard:
Theoretical proposal for a unified Bayesian model of adaptation in non-interactive and interactive speech production.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SimkoET25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SimkoET25
Juraj Simko, Benjamin Elie, Alice Turk:
Self-supervised Optimality-Guided Learning of Speech Articulation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoC25
Zhe-chen Guo, Bharath Chandrasekaran:
Extended High-frequency Cues to Phoneme Recognition: Insights from ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWZHLFLLY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWZHLFLLY25
Jia-Xin Chen, Yi-Ming Wang, Ziyu Zhang, Jiayang Han, Yin-Long Liu, Rui Feng, Xiuyuan Liang, Zhen-Hua Ling, Jia-Hong Yuan:
Decoding Speaker-Normalized Pitch from EEG for Mandarin Perception.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuY0LLF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuY0LLF25
Tong Zhu, Xiaoke Yang, Jian Zhou, Lu Li, Zhao Lv, Cunhang Fan:
SSF-DST: A Spectro-Spatial Features Enhanced Deep Spatiotemporal Network for EEG-Based Auditory Attention Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YanXZL0W025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YanXZL0W025
Yujie Yan, Xiran Xu, Haolin Zhu, Songyi Li, Bo Wang, Xihong Wu, Jing Chen:
Overestimated performance of auditory attention decoding caused by experimental design in EEG recordings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SharmaCKGG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SharmaCKGG25
Chetan Sharma, Vaishnavi Chandwanshi, Shreya Shrikant Karkun, Aditya Anand Gupta, Prasanta Kumar Ghosh:
A real-time MRI study on asymmetry in velum dynamics during VCV production with nasal sounds.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SmithCPAL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SmithCPAL25
Carey Smith, Hu Cheng, Pertti Palo, Daniel Aalto, Steven M. Lulich:
Exploratory Analysis of Brainstem fMRI Data During Sustained Phonation.

Conversation, Communication and Interaction 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeoMMP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeoMMP25
Seongsil Heo, Christi Miller, Calvin Murdock, Michael J. Proulx:
Gaze-Enhanced Multimodal Turn-Taking Prediction in Triadic Conversations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RussellH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RussellH25
Sam O'Connor Russell, Naomi Harte:
Visual Cues Support Robust Turn-taking Prediction in Noise.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FukunagaNOK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FukunagaNOK25
Yoshinori Fukunaga, Ryota Nishimura, Kengo Ohta, Norihide Kitaoka:
Backchannel prediction for natural spoken dialog systems using general speaker and listener information.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaihaqiCKY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaihaqiCKY25
Muhammad Yeza Baihaqi, Angel F. Garcia Contreras, Seiya Kawano, Koichiro Yoshino:
Rapport-Building Dialogue Strategies for Deeper Connection: Integrating Proactive Behavior, Personalization, and Aizuchi Backchannels.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuttnerCKMDR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuttnerCKMDR25
Lena-Marie Huttner, Jeppe H. Christensen, Gitte Keidser, Tobias May, Torsten Dau, Sergi Rotger-Griful:
Does effortful speech production indicate communication difficulty caused by noise and hearing aid support?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CavalcantiS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CavalcantiS25
Julio Cesar Cavalcanti, Gabriel Skantze:
"Dyadosyncrasy", Idiosyncrasy and Demographic Factors in Turn-Taking.

Robust Speaker Verification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LepageD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LepageD25
Théo Lepage, Réda Dehak:
SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001JK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001JK25
Minu Kim, Kangwook Jang, Hoirin Kim:
ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0030MCPJM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0030MCPJM25
Zhe Li, Man-Wai Mak, Jen-Tzung Chien, Mert Pilanci, Zezhong Jin, Helen Meng:
Disentangling Speaker and Content in Pre-trained Speech Models with Latent Diffusion for Robust Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FilhoSBF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FilhoSBF25
Alexandre Ferro Filho, Diogo Fernandes Costa Silva, Pedro Elias Engelberg Silva Borges, Arlindo Rodrigues Galvão Filho:
Evaluating Deep Speaker Embedding Robustness to Domain, Sampling Rate, and Codec Variations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWLAX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWLAX25
Zhiyong Chen, Shuhang Wu, Xinnuo Li, Zhiqi Ai, Shugong Xu:
Towards Robust Speaker Recognition against Intrinsic Variation with Foundation Model Few-shot Tuning and Effective Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiMRLH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiMRLH25
Jin Li, Man-Wai Mak, Johan Rohdin, Kong Aik Lee, Hynek Hermansky:
Bayesian Learning for Domain-Invariant Speaker Verification and Anti-Spoofing.

Multilingual ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MimuraLK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MimuraLK25
Masato Mimura, Jaeyoung Lee, Tatsuya Kawahara:
Switch Conformer with Universal Phonetic Experts for Multilingual ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Yang00TP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Yang00TP25
Hongli Yang, Sheng Li, Hao Huang, Ayiduosi Tuohan, Yizhou Peng:
Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiSZLT0Q25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiSZLT0Q25
Jiahong Li, Yiwen Shao, Jianheng Zhuo, Chenda Li, Liliang Tang, Dong Yu, Yanmin Qian:
Efficient Multilingual ASR Finetuning via LoRA Language Experts.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BagatI025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BagatI025
Raphaël Bagat, Irina Illina, Emmanuel Vincent:
Mixture of LoRA Experts for Low-Resourced Multi-Accent Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YongPAM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YongPAM25
Zheng Xin Yong, Vineel Pratap, Michael Auli, Jean Maillard:
Effects of Speaker Count, Duration, and Accent Diversity on Zero-Shot Accent Robustness in Low-Resource ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Mehralianh25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Mehralianh25
Pouya Mehralian, Hugo Van hamme:
Leveraging Geographic Metadata for Dialect-Aware Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OzyilmazCV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OzyilmazCV25
Ömer Tarik Özyilmaz, Matt Coler, Matias Valdenegro-Toro:
Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuoYSXY0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuoYSXY0025
Jianheng Zhuo, Yifan Yang, Yiwen Shao, Yong Xu, Dong Yu, Kai Yu, Xie Chen:
VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangAA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangAA25
Yingzhi Wang, Anas Alhmoud, Muhammad Alqurishi:
Open Universal Arabic ASR Leaderboard.

Multi-channel Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangYL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangYL25
Yujie Yang, Bing Yang, Xiaofei Li:
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangRSSLL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangRSSLL25
Zheng Wang, Xiaobin Rong, Yu Sun, Tianchi Sun, Zhibin Lin, Jing Lu:
A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShenZW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShenZW25
Pengjie Shen, Xueliang Zhang, Zhong-Qiu Wang:
ARiSE: Auto-Regressive Multi-Channel Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanZP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanZP25
Lu Han, Junqi Zhao, Renhua Peng:
WTFormer: A Wavelet Conformer Network for MIMO Speech Enhancement with Spatial Cues Peservation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlipWCGLW025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlipWCGLW025
Nurali Alip, Tianrui Wang, Rui Cao, Meng Ge, Jingru Lin, Longbiao Wang, Jianwu Dang:
A Three-Stage Beamforming with Harmonic Guidance for Multi-Channel Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QinXZJB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QinXZJB25
Chengyuan Qin, Wenmeng Xiong, Jing Zhou, Maoshen Jia, Changchun Bao:
Speech Enhancement with Dual-path Multi-Channel Linear Prediction Filter and Multi-norm Beamforming.

Self-supervised Learning

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Cui0D0HWLZ0L25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Cui0D0HWLZ0L25
Mingyu Cui, Yifan Yang, Jiajun Deng, Jiawen Kang, Shujie Hu, Tianzi Wang, Zhaoqing Li, Shiliang Zhang, Xie Chen, Xunying Liu:
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VaessenOL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VaessenOL25
Nik Vaessen, Roeland Ordelman, David A. van Leeuwen:
Self-supervised learning of speech representations with Dutch archival data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KutsakovMGBM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KutsakovMGBM25
Aleksandr Kutsakov, Alexandr Maximenko, Georgii Gospodinov, Pavel Bogomolov, Fyodor Minkin:
GigaAM: Efficient Self-Supervised Learner for Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChiALRHC0A25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChiALRHC0A25
Hyung-Gun Chi, Zakaria Aldeneh, Tatiana Likhomanenko, Oggi Rudovic, Takuya Higuchi, Li-Wei Chen, Shinji Watanabe, Ahmed Hussen Abdelaziz:
DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OndaKTF025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OndaKTF025
Kentaro Onda, Yosuke Kashiwagi, Emiru Tsunoo, Hayato Futami, Shinji Watanabe:
Differentiable K-means for Fully-optimized Discrete Token-based ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WhettenMPDE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WhettenMPDE25
Ryan Whetten, Lucas Maison, Titouan Parcollet, Marco Dinarelli, Yannick Estève:
Towards Early Prediction of Self-Supervised Speech Model Performance.

Singing Voice and Audio Synthesis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiBOL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiBOL25
Joon-Seung Choi, Dong-Min Byun, Hyung-Seok Oh, Seong-Whan Lee:
VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangCWZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangCWZ025
Chenyu Yang, Hangting Chen, Shuai Wang, Haina Zhu, Haizhou Li:
TVC-MusicGen: Time-Varying Structure Control for Background Music Generation via Self-Supervised Training.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Yang00HT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Yang00HT25
Mu Yang, Bowen Shi, Matthew Le, Wei-Ning Hsu, Andros Tjandra:
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuS25
Mingda Liu, Jiatong Shi:
Bridging Speech and Singing: Multi-stage Speech-Prompted Singing Voice Conversion with Speaker Embedding Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuW0J25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuW0J25
Yicheng Gu, Chaoren Wang, Zhizheng Wu, Lauri Juvela:
Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0008WH00Y25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0008WH00Y25
Zijing Zhao, Kai Wang, Hao Huang, Ying Hu, Liang He, Jichen Yang:
VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0071SLYWF025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0071SLYWF025
Wei Chen, Binzhu Sha, Dan Luo, Jing Yang, Zhuo Wang, Fan Fan, Zhiyong Wu:
DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouDX00K25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouDX00K25
Wangjin Zhou, Tianjiao Du, Chenglin Xu, Sheng Li, Yi Zhao, Tatsuya Kawahara:
Simple and Effective Content Encoder for Singing Voice Conversion via SSL-Embedding Dimension Reduction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaeSHPL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaeSHPL25
Yunkee Chae, Eunsik Shin, Suntae Hwang, Seungryeol Paik, Kyogu Lee:
Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control.

Acoustic and Articulatory Cues in Speech Perception

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DongSCS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DongSCS25
Wenwei Dong, Alif Silpachai, Catia Cucchiarini, Helmer Strik:
Multitalker Babble in English Vowel Perception Training: A Comparison between Humans and Neural Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaudrainVB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaudrainVB25
Etienne Gaudrain, Sarah Verhulst, Deniz Baskent:
Speech stimulus design to study the neural coding of speech and the impact of cochlear synaptopathy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JanseSC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JanseSC25
Esther Janse, Chen Shen, Martin Cooke:
Prediction of listening effort ratings for habitual and clear-Lombard speech presented in noise.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiongGC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiongGC25
Shengyue Xiong, Zhe-chen Guo, Bharath Chandrasekaran:
Language and Accent Familiarity Effects on the Use of Acoustic Cues in Talker Identification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RachmanB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RachmanB25
Laura Rachman, Deniz Baskent:
Characterization of voice cue sensitivity and vocal emotion recognition across the adult lifespan.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanIPC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanIPC25
Zixia Fan, Ronny Ibrahim, Joshua Penney, Felicity Cox:
Creaky Voice Facilitates More Efficient Phonological Processing of Mandarin Tone 3.

Audio Event Detection and Classification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoshinagaBTIOM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoshinagaBTIOM25
Tomoya Yoshinaga, Yoshiaki Bando, Keitaro Tanaka, Keisuke Imoto, Masaki Onishi, Shigeo Morishima:
Training Onset-and-Offset-Aware Sound Event Detection on a Heterogeneous Dataset via Probabilistic Sequential Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FangHXZYXD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FangHXZYXD25
Yulu Fang, Mingyue He, Qisheng Xu, Jianqiao Zhao, Cheng Yang, Kele Xu, Yong Dou:
Multi-view Fusion and Parameter Perturbation for Few-Shot Class-Incremental Audio Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiLTHK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiLTHK25
Yongjie Si, Yanxiong Li, Jiaxin Tan, Qianhua He, Il-Youp Kwak:
Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Montero-Ramirez25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Montero-Ramirez25
Claudia Montero-Ramírez, Alba Martínez-Serrano, Jorge Garcelán-Gómez, Francisco J. Valverde-Albacete, Carmen Peláez-Moreno:
Beyond Conventional Metrics: using Entropic Triangles to Explain Balancing Methods in Acoustic Scene Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AcevedoRF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AcevedoRF25
Emiliano Acevedo, Martín Rocamora, Magdalena Fuentes:
Domain Adaptation Method and Modality Gap Impact in Audio-Text Models for Prototypical Sound Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RyuOLP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RyuOLP25
Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han Park:
Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation.

Inclusivity

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EmezueCAOEEEOAA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EmezueCAOEEEOAA25
Chris Emezue, NaijaVoices Community, Busayo Awobade, Abraham Toluwase Owodunni, Handel Emezue, Gloria Monica Tobechukwu Emezue, Nefertiti Nneoma Emezue, Sewade Ogun, Bunmi Akinremi, David Ifeoluwa Adelani, Chris Pal:
The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JahanSMFT0HDM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JahanSMFT0HDM25
Maliha Jahan, Yinglun Sun, Priyam Mazumdar, Zsuzsanna Fagyal, Thomas Thebaud, Jesús Villalba, Mark Hasegawa-Johnson, Najim Dehak, Laureano Moro-Velázquez:
FaiST: A Benchmark Dataset for Fairness in Speech Technology.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AltwlkanyKL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AltwlkanyKL25
Kemal Altwlkany, Amar Kuric, Emanuel Lacic:
On the Language and Gender Biases in PSTN, VoIP and Neural Audio Codecs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GiraldoLAZE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GiraldoLAZE25
José Giraldo, Alex Peiró Lilja, Carme Armentano-Oller, Rodolfo Zevallos, Cristina España-Bonet:
Evaluating Speech Enhancement Performance Across Demographics and Language.

Voice Conversion 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AktiNW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AktiNW25
Seymanur Akti, Tuan-Nam Nguyen, Alexander Waibel:
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SudaTF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SudaTF25
Hitoshi Suda, Shinnosuke Takamichi, Satoru Fukayama:
Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BiyaniSGWS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BiyaniSGWS25
Ishan D. Biyani, Nirmesh J. Shah, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah:
REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LobashevYL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LobashevYL25
Alexander Lobashev, Assel Yermekova, Maria A. Larchenko:
Training-Free Voice Conversion with Factorized Optimal Transport.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuCZQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCZQ25
Yihan Liu, Zhengyang Chen, Leying Zhang, Yanmin Qian:
E2E-BPVC: End-to-End Background-Preserving Voice Conversion via In-Context Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangG0HCWHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangG0HCWHL25
Kaidi Wang, Wenhao Guan, Ziyue Jiang, Hukai Huang, Peijie Chen, Weijie Wu, Qingyang Hong, Lin Li:
Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RenGWCHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RenGWCHL25
Pengyu Ren, Wenhao Guan, Kaidi Wang, Peijie Chen, Qingyang Hong, Lin Li:
ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JinYZW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinYZW25
Jiawei Jin, Zhihan Yang, Yixuan Zhou, Zhiyong Wu:
In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KamperNZC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KamperNZC25
Herman Kamper, Benjamin van Niekerk, Julian Zaïdi, Marc-André Carbonneau:
LinearVC: Linear Transformations of Self-Supervised Features Through the Lens of Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuXLHX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuXLHX25
Desheng Hu, Yang Xiang, Jian Lu, Xinhui Hu, Xinkang Xu:
Speaker Normalization and Content Restoration for Zero-Shot Voice Conversion with Attention-Enhanced Discriminator.

Speech-based Cognitive Assessment 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KeMM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KeMM25
Xiaoquan Ke, Man-Wai Mak, Helen Meng:
Optimizing Pause Context in Fine-Tuning Pre-trained Large Language Models for Dementia Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AkinrintoyoAS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AkinrintoyoAS25
Emmanuel Akinrintoyo, Nadine Abdelhalim, Nicole Salomons:
WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BotelhoGTMPNRPS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BotelhoGTMPNRPS25
Catarina Botelho, David Gimeno-Gómez, Francisco Teixeira, John Mendonça, Patrícia Pereira, Diogo A. P. Nunes, Thomas Rolland, Anna Pompili, Rubén Solera-Ureña, Maria Ponte, David Martins de Matos, Carlos D. Martínez-Hinarejos, Isabel Trancoso, Alberto Abad:
Acoustic and Linguistic Biomarkers for Cognitive Impairment Detection from Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoCG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoCG25
Yao Xiao, Heidi Christensen, Stefan Goetze:
Alzheimer's Dementia Detection Using Perplexity from Paired Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MansiLWGD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MansiLWGD25
Mansi, Anastasios Lepipas, Dominika C. Woszczyk, Yiying Guan, Soteris Demetriou:
Understanding Dementia Speech Alignment with Diffusion-Based Image Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WoszczykAD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WoszczykAD25
Dominika C. Woszczyk, Ranya Aloufi, Soteris Demetriou:
ClaritySpeech: Dementia Obfuscation in Speech.

Source Separation 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimKHLYHSCK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimKHLYHSCK25
Jihyun Kim, Doyeon Kim, Hyewon Han, Jinyoung Lee, Jonguk Yoo, Chang Woo Han, Jeongook Song, Hoon-Young Cho, Hong-Goo Kang:
Quadruple Path Modeling with Latent Feature Transfer for Permutation-free Continuous Speech Separation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JingZG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JingZG25
Kangqi Jing, Wenbin Zhang, Yu Gao:
End-to-End DOA-Guided Speech Extraction in Noisy Multi-Talker Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangSY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangSY25
Xue Yang, Guiru Shen, Yu Yang:
Speaker Separation for an Unknown Number of Speakers with Encoder-Decoder-Based Contextual Information Module.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangPDV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangPDV25
Yuzhu Wang, Archontis Politis, Konstantinos Drossos, Tuomas Virtanen:
Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlizadehTZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlizadehTZ25
Hadi Alizadeh, Rahil Mahdian Toroghi, Hassan Zareian:
ReSepNet: A Unified-Light Model for Recursive Speech Separation with Unknown Speaker Count.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AvidanL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AvidanL25
Tzlil Avidan, Bracha Laufer-Goldshtein:
Deep-Simplex Multichannel Speech Separation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangJQST25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangJQST25
Haoxu Wang, Yiheng Jiang, Gang Qiao, Pengteng Shi, Biao Tian:
FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaoJH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaoJH25
Liang Tao, Maoshen Jia, Yonggang Hu:
Power Spectral Density Estimation for Acoustic Source Separation Using A Spherical Microphone Array.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangWCQT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangWCQT25
Yiheng Jiang, Haoxu Wang, Yafeng Chen, Gang Qiao, Biao Tian:
Exploring Efficient Directional and Distance Cues for Regional Speech Separation.

Language and Accent Identification and Speaker Privacy

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DeyMK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DeyMK25
Spandan Dey, Hirak Mondal, Sanjay Kumar Kurmi:
Teacher-Free Knowledge Distillation for Improving Short-Utterance Spoken Language Identification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BafnaW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BafnaW25
Niyati Bafna, Matthew Wiesner:
LID Models are Actually Accent Classifiers: Implications and Solutions for LID on Accented Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PremananthKE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PremananthKE25
Gowtham Premananth, Vinith Kugathasan, Carol Y. Espy-Wilson:
Analyzing the Impact of Accent on English Speech: Acoustic and Articulatory Perspectives.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ambikairajah0DS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ambikairajah0DS25
Eliathamby Ambikairajah, Jingyao Wu, Ting Dang, Vidhyasaharan Sethu:
A Study of Speech Embedding Similarities Between Australian Aboriginal and High-Resource Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fathan0Z25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fathan0Z25
Abderrahim Fathan, Jahangir Alam, Xiaolin Zhu:
An Investigative Study on Recent Sharpness- and Flatness-Based Optimizers for Enhanced Self-Supervised Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuHZLSGDZF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuHZLSGDZF25
Chenguang Hu, Yaqian Hao, Fulin Zhang, Xiaoxue Luo, Yao Shen, Yingying Gao, Chao Deng, Shilei Zhang, Junlan Feng:
Privacy-Preserving Speaker Verification via End-to-End Secure Representation Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KarimovVIKR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KarimovVIKR25
Elvir Karimov, Alexander Varlamov, Danil Ivanov, Dmitrii Korzh, Oleg Rogov:
Novel Loss-Enhanced Universal Adversarial Patches for Sustainable Speaker Privacy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengFH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengFH25
Ying Meng, Zhihua Fang, Liang He:
Federated Learning with Feature Space Separation for Speaker Recognition.

Source Tracing: The Origins of Synthetic or Manipulated Speech

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FalezMLD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FalezMLD25
Pierre Falez, Tony Marteau, Damien Lolive, Arnaud Delhay:
Audio Deepfake Source Tracing using Multi-Attribute Open-Set Identification and Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KulkarniDAM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KulkarniDAM25
Ajinkya Kulkarni, Sandipana Dowerah, Tanel Alumäe, Mathew Magimai-Doss:
Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenLZDWLJ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenLZDWLJ25
Xuanjun Chen, I-Ming Lin, Lin Zhang, Jiawei Du, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang:
Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StanCOC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StanCOC25
Adriana Stan, David Combei, Dan Oneata, Horia Cucu:
TADA: Training-free Attribution and Out-of-Domain Detection of Audio Deepfakes.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NegroniSBT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NegroniSBT25
Viola Negroni, Davide Salvi, Paolo Bestagini, Stefano Tubaro:
Source Verification for Speech Deepfakes.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FircCMSKM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FircCMSKM25
Anton Firc, Manasi Chhibber, Jagabandhu Mishra, Vishwanath Pratap Singh, Tomi Kinnunen, Kamil Malinka:
STOPA: A Dataset of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoutsianosZPS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoutsianosZPS25
Dimitrios Koutsianos, Stavros Zacharopoulos, Yannis Panagakis, Themos Stafylakis:
Synthetic Speech Source Tracing using Metric Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoD25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoD25a
Yang Xiao, Rohan Kumar Das:
Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incremental Learning Method for Audio Deepfake Source Tracing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoanHJ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoanHJ25
Thien-Phuc Doan, Kihun Hong, Souhwan Jung:
VIB-based Real Pre-emphasis Audio Deepfake Source Tracing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoMDMW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoMDMW25
Jiankun Zhao, Lingwei Meng, Chengxi Deng, Helen Meng, Xixin Wu:
Defending Unauthorized Voice Cloning with Watermark-Aware Codecs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KleinT025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KleinT025
Nicholas Klein, Hemlata Tak, Elie Khoury:
Open-Set Source Tracing of Audio Deepfake Systems.

Speaker Diarization 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanLRSDCB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanLRSDCB25
Jiangyu Han, Federico Landini, Johan Rohdin, Anna Silnova, Mireia Díez, Jan Cernocký, Lukás Burget:
Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Singh0D25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Singh0D25
Prabhav Singh, Jesús Villalba, Najim Dehak:
Count Your Speakers! Multitask Learning for Multimodal Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PalzerMF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PalzerMF25
David Palzer, Matthew Maciejewski, Eric Fosler-Lussier:
End-to-End Diarization utilizing Attractor Deep Clustering.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DurmusMPOL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DurmusMPOL25
Berkin Durmus, Blaise Munyampirwa, Eduardo Pacheco, Atila Orhon, Andrey Leonov:
SDBench: A Comprehensive Benchmark Suite for Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Tan0Z0W025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Tan0Z0W025
Fengyun Tan, Tao Wei, Kun Zou, Ning Cheng, Shaojun Wang, Jing Xiao:
Enhancing Serialized Output Training for Multi-Talker ASR with Soft Monotonic Alignment and Utterance-level Timestamp.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoriguchiATD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoriguchiATD25
Shota Horiguchi, Atsushi Ando, Naohiro Tawara, Marc Delcroix:
Pretraining Multi-Speaker Identification for Neural Speaker Diarization.

Multilingual Speech Synthesis and Special Applications 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KwonSL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KwonSL25
Ki-Joong Kwon, Jun-Ho So, Sang-Hoon Lee:
Parameter-Efficient Fine-Tuning for Low-Resource Text-to-Speech via Cross-Lingual Continual Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaiIWJWL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaiIWJWL25
Qibing Bai, Sho Inoue, Shuai Wang, Zhongjie Jiang, Yannan Wang, Haizhou Li:
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLG25
Chang Liu, Zhen-Hua Ling, Yu Gu:
LIST: Language-Independent Speech Token for Multilingual Speech Synthesis with Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NaseemSARMH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NaseemSARMH25
Fatima Naseem, Maham Sajid, Farah Adeeba, Sahar Rauf, Asad Mustafa, Sarmad Hussain:
Developing High-Quality TTS for Punjabi and Urdu: Benchmarking against MMS Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RautenbergSWKWH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RautenbergSWKWH25
Frederik Rautenberg, Fritz Seebauer, Jana Wiechmann, Michael Kuhlmann, Petra Wagner, Reinhold Haeb-Umbach:
Synthesizing Speech with Selected Perceptual Voice Qualities - A Case Study with Creaky Voice.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TannanderHBE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TannanderHBE25
Christina Tånnander, David House, Jonas Beskow, Jens Edlund:
Intrasentential English in Swedish TTS: perceived English-accentedness.

Characterization and Multimodal Approaches for Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengG0GDL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengG0GDL25
Shengyu Peng, Wu Guo, Jie Zhang, Yu Guan, Lipeng Dai, Zuoliang Li:
Parameter-Efficient Fine-tuning with Instance-Aware Prompt and Parallel Adapters for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GriotMBBM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GriotMBBM25
Nathan Griot, Driss Matrouf, Raphaël Blouet, Jean-François Bonastre, Ana Mantecon:
Unified Text and Speaker Verification using SSL model for Text-Dependent Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunZWZX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunZWZX25
Zhaokai Sun, Li Zhang, Qing Wang, Pan Zhou, Lei Xie:
Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShashaankQKVSH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShashaankQKVSH25
N. Shashaank, Xiao Quan, Andrew Kaluzny, Leonard Varghese, Marko Stamenovic, Chuan-Che Huang:
Towards Secure User Authentication for Headphones via In-Ear or In-Earcup Microphones.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuLKLL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuLKLL25
Gwangyeol Yu, Junhyeok Lee, Seoryeong Kim, Jimin Lee, Jehyuk Lee:
Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CR25
Bhasi K. C., Rajeev Rajan:
A Siamese Network-Based Framework for Voice Mimicry Proficiency Assessment Using X-Vector Embeddings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanGABMRB025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanGABMRB025
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma:
Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RanjanLV025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RanjanLV025
Rishabh Ranjan, Ayinala Likhith, Mayank Vatsa, Richa Singh:
Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Maeda0S25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Maeda0S25
Chikara Maeda, Muhammad Shakeel, Yui Sudo:
Joint Target-Speaker ASR and Activity Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimJ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimJ25
Wooil Kim, Bongsu Jung:
DLF-EEND: Dynamic Layer Fusion for End-to-End Speaker Diarization.

Acoustic Analysis and Bioacoustics

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AR25
Noumida A, Rajeev Rajan:
Analysis of Avian Biphonic Vocalization Using Computational Modelling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZW25
Xingyuan Li, Kenny Q. Zhu, Mengyue Wu:
Dog2vec: Self-Supervised Pre-Training for Canine Vocal Representation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RM25
Ezhini Rasendiran R, Chandresh Kumar Maurya:
Improving Bird Classification with Primary Color Additives.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004CZJDQSYL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004CZJDQSYL25
Chenhao Wu, Xiangjun Cai, Haojie Zhang, Tianrui Jia, Yilu Deng, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto, Jiang Liu:
Exploring the Power of Empirical Mode Decomposition for Sensing the Sound of Silence: A Pilot Study on Mice Autism Detection via Ultrasonic Vocalisation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongZL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongZL25
Yuchen Song, Yucong Zhang, Ming Li:
Exploring Pre-trained models on Ultrasound Modeling for Mice Autism Detection with Uniform Filter Bank and Attentive Scoring.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0007S0Z00TTSY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0007S0Z00TTSY25
Zijiang Yang, Meishu Song, Xin Jing, Haojie Zhang, Kun Qian, Bin Hu, Kota Tamada, Toru Takumi, Björn W. Schuller, Yoshiharu Yamamoto:
MADUV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SzmajdzinskiWRL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzmajdzinskiWRL25
Szymon Szmajdzinski, Juliusz Wójtowicz-Kruk, Ivan Ryzhankow, Lukasz Lazarski, Jakub Zak, Wladyslaw Sredniawa:
Significance of Time-Frequency preprocessing for automatic Ultrasonic Vocalization classification in Autism Spectrum Disorder model detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TellierERL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TellierERL25
Quentin Le Tellier, Marc Evrard, Albert Rilliard, Jean-Sylvain Liénard:
Robust Vocal Intensity Prediction: Overcoming Dataset Bias with Pretrained Deep Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TerashimaSK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TerashimaSK25
Ryo Terashima, Yuma Shirahata, Masaya Kawamura:
SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch.

Keynote 2 - Alexander Waibel: From Speech Science to Language Transparence

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Waibel25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Waibel25
Alexander Waibel:
From Speech Science to Language Transparence.

Spoken Dialogue Systems 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Do0N25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Do0N25
Truong Do, Phuong Minh Nguyen, Le-Minh Nguyen:
PruneSLU: Efficient On-device Spoken Language Understanding through Vocabulary and Structural Pruning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GulzarBMEM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GulzarBMEM25
Haris Gulzar, Monikka Roslianna Busto, Akiko Masaki, Takeharu Eda, Ryo Masumura:
Leveraging LLMs for Written to Spoken Style Data Transformation to Enhance Spoken Dialog State Tracking.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SedlacekYSHKPC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SedlacekYSHKPC25
Simon Sedlácek, Bolaji Yusuf, Jan Svec, Pradyoth Hegde, Santosh Kesiraju, Oldrich Plchot, Jan Cernocký:
Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoriKLICY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoriKLICY25
Kiyotada Mori, Seiya Kawano, Chaoran Liu, Carlos Toshinori Ishi, Angel F. Garcia Contreras, Koichiro Yoshino:
What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangBWVSH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangBWVSH25
Minghan Wang, Ye Bai, Yuxia Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari:
SpeechDialogueFactory: A Framework for Natural Speech Dialogue Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shi0T25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shi0T25
Xiaohan Shi, Xingfeng Li, Tomoki Toda:
Who, When, and What: Leveraging the "Three Ws" Concept for Emotion Recognition in Conversation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoudounasSGB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoudounasSGB25
Alkis Koudounas, Claudio Savelli, Flavio Giobergia, Elena Baralis:
"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ArisoyMMO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArisoyMMO25
Ebru Arisoy, Merve Ünlü Menevse, Yusufcan Manav, Arzucan Özgür:
Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TelekiSLC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TelekiSLC25
Maria Teleki, Lingfeng Shi, Chengkai Liu, James Caverlee:
I want a horror - comedy - movie: Slips-of-the-Tongue Impact Conversational Recommender System Performance.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhashiIJH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhashiIJH25
Atsumoto Ohashi, Shinya Iizuka, Jingjing Jiang, Ryuichiro Higashinaka:
Towards a Japanese Full-duplex Spoken Dialogue System.

Speech Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangHPL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangHPL25
Chih-Kai Yang, Neo Ho, Yen-Ting Piao, Hung-yi Lee:
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZYQWH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZYQWH25
Guitao Wang, Jinming Zhao, Hao Yang, Guilin Qi, Tongtong Wu, Gholamreza Haffari:
Continual Speech Learning with Fused Speech Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiS025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiS025
Jiatong Shi, Hye-jin Shim, Shinji Watanabe:
Uni-VERSA: Versatile Speech Assessment with a Unified Network.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AldereteHM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AldereteHM25
John Alderete, Macarious Kin Fung Hui, Aanchan Mohan:
Evaluating ASR Robustness to Spontaneous Speech Errors: A Study of WhisperX Using a Speech Error Database.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MizumotoKFLS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MizumotoKFLS25
Tomoya Mizumoto, Atsushi Kojima, Yusuke Fujita, Lianbo Liu, Yui Sudo:
Is Synthetic Data Truly Effective for Training Speech Language Models?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VerdiniMPCGPMKB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VerdiniMPCGPMKB25
Francesco Verdini, Pierfrancesco Melucci, Stefano Perna, Francesco Cariaggi, Marco Gaido, Sara Papi, Szymon Mazurek, Marek Kasztelnik, Luisa Bentivogli, Sébastien Bratières, Paolo Merialdo, Simone Scardapane:
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not.

Audio-Visual ASR and Multimodal System

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZapataH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZapataH25
Julián Zapata, Lara Hanna:
Text Entry for All: Towards Speech-based Multimodal Interaction for Inclusion, Accessibility and the Preservation of the World's Linguistic Heritage.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CappellazzoKPFB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CappellazzoKPFB25
Umberto Cappellazzo, Minsu Kim, Stavros Petridis, Daniele Falavigna, Alessio Brutti:
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenPW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenPW25
Thai-Binh Nguyen, Ngoc-Quan Pham, Alexander Waibel:
Cocktail-Party Audio-Visual Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiRGBF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiRGBF25
Zhengyang Li, Pascal Reichert, Thomas Graave, Patrick Blumenberg, Tim Fingscheidt:
Efficient Noise-Robust Hybrid Audiovisual Encoder with Joint Distillation and Pruning for Audiovisual Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MakishimaKYITSO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MakishimaKYITSO25
Naoki Makishima, Naotaka Kawata, Taiga Yamane, Mana Ihori, Tomohiro Tanaka, Satoshi Suzuki, Shota Orihashi, Ryo Masumura:
Unified Audio-Visual Modeling for Recognizing Which Face Spoke When and What in Multi-Talker Overlapped Speech and Video.

Speech and Voice Disorders 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangDKZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangDKZ25
Shangkun Huang, Jing Deng, Jintao Kang, Rong Zheng:
Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeLZZLLGDPEVMBW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeLZZLLGDPEVMBW25
Zongli Ye, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Haodong Li, Shuhe Li, Chenxu Guo, Anaisha Das, Peter Park, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli:
Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZLLLEBWMVM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZLLLEBWMVM25
Jinming Zhang, Xuanru Zhou, Jiachen Lian, Shuhe Li, William Li, Zoe Ezzes, Rian Bogley, Lisa Wauters, Zachary Miller, Jet Vonk, Brittany Morin, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli:
Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkL25
Yeseul Park, Bowon Lee:
Fine-tuning Strategies for Automatic Speech Recognition of Low-Resource Speech with Autism Spectrum Disorder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MassonFM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MassonFM25
Margot Masson, Isabelle Ferrané, Julie Mauclair:
Identification of Pathological Pronunciation Profiles in ASR Transcription Errors.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TiteuxNGBD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TiteuxNGBD25
Hadrien Titeux, Quang Tuan Rémy Nguyen, Andres Gil-Salcedo, Anne-Catherine Bachoud-Lévi, Emmanuel Dupoux:
A simple method for predicting Clinical Scores in Huntington's Disease by leveraging ASR's uncertainty on spontaneous speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ben-DomWM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ben-DomWM25
Itay Ben-Dom, Catherine I. Watson, Clare M. McCann:
Introducing EMOPARKNZ: the Emotional Speech Database from New Zealand English Speakers with Parkinson's Disease.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HojoTSTNNT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HojoTSTNNT25
Naoki Hojo, Ryoichi Takashima, Chihiro Sugiyama, Nobukazu Tanaka, Kanji Nohara, Kazunori Nozaki, Tetsuya Takiguchi:
Revisiting WFST-based Hybrid Japanese Speech Recognition System for Individuals with Organic Speech Disorders.

Multimodal Information Based Speech Processing (MISP) 2025 Challenge

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuoLLH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuoLLH25
Longjie Luo, Shenghui Lu, Lin Li, Qingyang Hong:
Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoW0000CSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoW0000CSS25
Ming Gao, Shilong Wu, Hang Chen, Jun Du, Chin-Hui Lee, Shinji Watanabe, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg:
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZLLCLH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZLLCLH25
Zhaoyang Li, Haodong Zhou, Longjie Luo, XiaoXiao Li, Yongxin Chen, Lin Li, Qingyang Hong:
Cross-attention and Self-attention for Audio-visual Speaker Diarization in MISP-Meeting Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengSLLL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengSLLL25
Ming Cheng, Fei Su, Cancan Li, Juan Liu, Ming Li:
Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongSH0L25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongSH0L25
Zeyan Song, Tianchi Sun, Ronghui Hu, Kai Chen, Jing Lu:
Leveraging Self-Supervised Learning Based Speaker Diarization for MISP 2025 AVSD Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangDYZJDKZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangDYZJDKZ25
Shangkun Huang, Yuxuan Du, Jingwen Yang, Dejun Zhang, Xupeng Jia, Jing Deng, Jintao Kang, Rong Zheng:
Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge.

Speaker Extraction 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JalalRMKRSDHOJ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JalalRMKRSDHOJ25
Md Asif Jalal, Luca Remaggi, Vasileios Moschopoulos, Thanasis Kotsiopoulos, Vandana Rajan, Karthikeyan Saravanan, Anastasios Drosou, Junho Heo, Hyuk Oh, Seokyeong Jeong:
Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DaiPV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DaiPV25
Wang Dai, Archontis Politis, Tuomas Virtanen:
Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWHZW025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWHZW025
Shaole Li, Shuai Wang, Jiangyu Han, Ke Zhang, Wupeng Wang, Haizhou Li:
REAL-T: Real Conversational Mixtures for Target Speaker Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanWZ00M025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanWZ00M025
Zexu Pan, Wupeng Wang, Shengkui Zhao, Chong Zhang, Kun Zhou, Yukun Ma, Bin Ma:
Online Audio-Visual Autoregressive Speaker Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanZW0MZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanZW0MZ025
Zexu Pan, Shengkui Zhao, Tingting Wang, Kun Zhou, Yukun Ma, Chong Zhang, Bin Ma:
Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction.

Low Resource Speech Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CartaGMMP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CartaGMMP25
Salvatore Carta, Alessandro Giuliani, Marco Manolo Manca, Mirko Marras, Leonardo Piano:
SardinianVoxes: A Speech Recognition Dataset for the Sardinian Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SmithYCF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SmithYCF25
Griffin Dietz Smith, Dianna Yee, Jennifer King Chen, Leah Findlater:
Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLKTSMM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLKTSMM25
Bowen Zhang, Nur Afiqah Abdul Latiff, Justin Kan, Rong Tong, Donny Soh, Xiaoxiao Miao, Ian McLoughlin:
Automated evaluation of children's speech fluency for low-resource languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuenCL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuenCL25
King Yiu Suen, Rudolf Chow, Albert Y. S. Lam:
Cantonese Punctuation Restoration using LLM Annotated Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SasuQYS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SasuQYS25
David Sasu, Benedict Quartey, Kweku Andoh Yamoah, Natalie Schluter:
Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinhaKK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinhaKK25
Abhijit Sinha, Hemant Kumar Kathania, Mikko Kurimo:
Beyond Traditional Speech Modifications : Utilizing Self Supervised Features for Enhanced Zero-Shot Children ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VisserK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VisserK25
Nicol Visser, Herman Kamper:
Spoken Language Modeling with Duration-Penalized Self-Supervised Units.

Computational Resource Constrained ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXJMWWCCHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXJMWWCCHL25
Zhaoqing Li, Haoning Xu, Zengrui Jin, Lingwei Meng, Tianzi Wang, Huimeng Wang, Youjun Chen, Mingyu Cui, Shujie Hu, Xunying Liu:
Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXXJWL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXXJWL25
Zhaoqing Li, Haoning Xu, Xurong Xie, Zengrui Jin, Tianzi Wang, Xunying Liu:
Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuLCWLGDL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuLCWLGDL25
Haoning Xu, Zhaoqing Li, Youjun Chen, Huimeng Wang, Guinan Li, Mengzhe Geng, Chengxi Deng, Xunying Liu:
Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gu00Q25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gu00Q25
Tianteng Gu, Bei Liu, Haoyu Wang, Yanmin Qian:
Ultra-Low Bit Post-Training Quantization of Large Speech Models via K-Means Clustering and Mixed Precision Allocation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SomekiBJLTJMSL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SomekiBJLTJMSL025
Masao Someki, Shikhar Bharadwaj, Atharva Anand Joshi, Chyi-Jiunn Lin, Jinchuan Tian, Jee-weon Jung, Markus Müller, Nathan Susanj, Jing Liu, Shinji Watanabe:
Context-Driven Dynamic Pruning for Large Speech Foundation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HilmesRS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HilmesRS25
Benedikt Hilmes, Nick Rossenbach, Ralf Schlüter:
Analyzing the Importance of Blank for CTC-Based Knowledge Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FongMB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FongMB25
Seraphina Fong, Marco Matassoni, Alessio Brutti:
Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages.

Speech and Language Technology for Health Applications

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanLLWXZC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanLLWXZC25
Yue Pan, Liwei Liu, Changxin Li, Xingyao Wang, Yili Xia, Hanyue Zhang, Ming Chu:
A Chinese Heart Failure Status Speech Database with Universal and Personalised Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BattulaDGP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BattulaDGP25
Harish Battula, Gauri Deshpande, Yagna Gudipalli, Sachin Patel:
Heart Rate as a Proxy Measure to Assess Human Confidence in Spoken Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NieTTKHAAM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NieTTKHAAM25
Jingping Nie, Tien Dung Tran, Karan Thakkar, Vasudha Kowtha, Jon Huang, Carlos Avendaño, Erdrin Azemi, Vikramjit Mitra:
Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanGABMRB025a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanGABMRB025a
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Santanu Roy, Arun Balaji Buduru, Rajesh Sharma:
Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenW25
Yizhou Chen, Xiyu Wu:
Perception of Emotional Speech by Individuals with High Borderline Personality Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SageMKKB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SageMKKB25
Agata Sage, Zuzanna Miodonska, Michal Krecichwost, Ewa Kwasniok, Pawel Badura:
Visual features of the oral region in Polish sibilants produced by children with various sibilance patterns.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PolleNGCG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PolleNGCG25
Roseline Polle, Agnes Norbury, Alexandra Livia Georgescu, Nicholas Cummins, Stefano Goria:
Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWWSSZS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWWSSZS25
Ning Wang, Bingyang Wen, Minghui Wu, Yang Sun, Zongru Shao, Haojie Zhou, K. P. Subbalakshmi:
Decoding Alzheimer's: Interpretable Visual and Logical Attention in Picture Description Tasks.

Responsible Speech Foundation Models + SUPERB Challenge

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlexosPJCVHR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlexosPJCVHR25
Antonios Alexos, Raghuveer Peri, Sai Muralidhar Jayanthi, Metehan Cekic, Srikanth Vishnubhotla, Kyu J. Han, Srikanth Ronanki:
Defending Speech-enabled LLMs Against Adversarial Jailbreak Threats.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinCL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinCL25
Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee:
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PuhachPS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PuhachPS25
Dariia Puhach, Amir H. Payberah, Éva Székely:
Who Gets the Mic? Investigating Gender Bias in the Speaker Assignment of a Speech-LLM.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GengLPJBGCKTLMJ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GengLPJBGCKTLMJ25
Mengzhe Geng, Patrick Littell, Aidan Pine, Robbie Jimerson, Gilles Boulianne, Vishwa Gupta, Rolando Coto-Solano, Anna Kazantseva, Marc Tessier, Delaney Lothian, Akwiratékha' Martin, Eric Joanis, Samuel Larkin, Roland Kuhn:
Evaluating Speech Foundation Models for Automatic Speech Recognition in the Low-Resource Kanyen'kéha Language.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhattacharyaKG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhattacharyaKG25
Debarpan Bhattacharya, Apoorva Kulkarni, Sriram Ganapathy:
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuanL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuanL25
Chun-Yi Kuan, Hung-yi Lee:
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuKL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuKL25
Ke-Han Lu, Chun-Yi Kuan, Hung-yi Lee:
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ManakulSSTP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ManakulSSTP25
Potsawee Manakul, Guangzhi Sun, Warit Sirichotedumrong, Kasima Tharnpipitchai, Kunat Pipatanakul:
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangS0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangS0025
Qingzheng Wang, Jiancheng Sun, Yifan Peng, Shinji Watanabe:
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenMSBWWMHJALL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenMSBWWMHJALL25
William Chen, Chutong Meng, Jiatong Shi, Martijn Bartelds, Shih-Heng Wang, Hsiu-Hsuan Wang, Rafael Mosquera, Sara Hincapie, Dan Jurafsky, Antonis Anastasopoulos, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlumaeF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlumaeF25
Tanel Alumäe, Artem Fedorchenko:
TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge.

Dysarthric Speech Assessment 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhongGHLL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhongGHLL25
Tao Zhong, Mengzhe Geng, Shujie Hu, Guinan Li, Xunying Liu:
Regularized Federated Learning for Privacy-Preserving Dysarthric and Elderly Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JeonIKL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JeonIKL25
Yejin Jeon, Solee Im, Youngjae Kim, Gary Geunbae Lee:
Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenYWWXWWM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenYWWXWWM25
Xueyuan Chen, Dongchao Yang, Wenxuan Wu, Minglin Wu, Jing Xu, Xixin Wu, Zhiyong Wu, Helen Meng:
DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DasSGU25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DasSGU25
Shoutrik Das, Nishant Singh, Arjun Gangwar, S. Umesh:
Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AboeittaSNS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AboeittaSNS25
Ahmed Aboeitta, Ahmed Sharshar, Youssef Nafea, Shady Shehata:
Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiYCPSDYONM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiYCPSDYONM25
Chin-Jou Li, Eunjung Yeo, Kwanghee Choi, Paula Andrea Pérez-Toro, Masao Someki, Rohan Kumar Das, Zhengjun Yue, Juan Rafael Orozco-Arroyave, Elmar Nöth, David R. Mortensen:
Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002BLC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002BLC25
Yan Xiong, Visar Berisha, Julie Liss, Chaitali Chakrabarti:
Mitigating Overfitting During Speech Foundation Model Fine-tuning: Applications to Dysarthric Speech Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkGKFYN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkGKFYN25
Seohyun Park, Chitralekha Gupta, Michelle Kah Yian Kwan, Xinhui Fung, Alexander Wenjun Yip, Suranga Nanayakkara:
Towards Temporally Explainable Dysarthric Speech Clarity Assessment.

Show and Tell 2: Speech Synthesis

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/GouravM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GouravM25
Vishal Gourav, Phanindra Mankale:
Code Mix TTS: An Approach to Infer Human Like Speech for Multi-Lingual Input Texts.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/NguyenL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenL25
Binh Nguyen, Thai Le:
Turing's Echo: Investigating Linguistic Sensitivity of Deepfake Voice Detection via Gamification.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ChoKKLLL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoKKLLL25
Namhyun Cho, Sunmin Kim, Minsu Kang, Seolhee Lee, Choonghyeon Lee, Yangsun Lee:
Unleashing the Inner Monster: Demonstrating High-Fidelity Human to Non-Human Voice Conversion.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ShepardsonRM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShepardsonRM25
Victor Shepardson, Jonathan Reus, Thor Magnusson:
Tungnaá In Live Performance: An Implementation Of Interactive Artistic Text-To-Voice.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/SatishHS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SatishHS25
Shree Harsha Bokkahalli Satish, Gustav Eje Henter, Éva Székely:
Hear Me Out: Interactive evaluation and bias discovery platform for speech-to-speech conversational AI.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Arai25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Arai25
Takayuki Arai:
Vocal-tract model with two directions: Static design for a dummy head and dynamic design for a speaking machine.

Databases and Progress in Methodology

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RustagiBK025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RustagiBK025
Arnav Rustagi, Satvik Bajpai, Nimrat Kaur, Siddharth:
Dhvani: A Weakly-supervised Phonemic Error Detection and Personalized Feedback System for Hindi.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FortT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FortT25
Alexandra Fort, Francis Tyers:
Evaluating Wav2Vec2-Bert for Computer-Assisted Pronunciation Training for isiZulu.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MarcinekBG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MarcinekBG25
Lubos Marcinek, Jonas Beskow, Joakim Gustafson:
Towards Adaptable and Intelligible Speech Synthesis in Noisy Environments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShahidiTDG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShahidiTDG25
Lidea Shahidi, Erdem Baha Topbas, Thu Ngan Dang, Tobias Goehring:
Harnessing Text-to-Speech Voice Cloning Models for Improved Audiological Speech Assessment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiZLMFTHGN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiZLMFTHGN25
Xuan Shi, Yubin Zhang, Yijing Lu, Marcus Ma, Tiantian Feng, Asterios Toutios, Haley Hsu, Louis Goldstein, Shrikanth Narayanan:
75-Speaker Annot-16: A benchmark dataset for speech articulatory rt-MRI annotation with articulator contours and phonetic alignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TuckuteKFY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TuckuteKFY25
Greta Tuckute, Klemen Kotar, Evelina Fedorenko, Daniel Yamins:
Representing Speech Through Autoregressive Prediction of Cochlear Tokens.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkCCK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkCCK25
Chanwoo Park, Anna Seo Gyeong Choi, Sunghye Cho, Chanwoo Kim:
Reasoning-Based Approach with Chain-of-Thought for Alzheimer's Detection Using Speech and Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BakkoucheMLCLRA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BakkoucheMLCLRA25
Linda Bakkouche, Charles McGhee, Emily Lau, Stephanie Cooper, Xinbing Luo, Madeleine Rees, Kai Alter, Brechtje Post, Julia Schwarz:
Finding the Human Voice in AI: Insights on the Perception of AI-Voice Clones from Naturalness and Similarity Ratings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OndaIFSM25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OndaIFSM25a
Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu:
Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora.

Novel Architectures for ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UganPW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UganPW25
Enes Yavuz Ugan, Ngoc-Quan Pham, Alexander Waibel:
Weight Factorization and Centralization for Continual Learning in Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoLZZLYPDEVMBW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoLZZLYPDEVMBW25
Chenxu Guo, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Shuhe Li, Zongli Ye, Peter Park, Anaisha Das, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli:
Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Hsieh025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Hsieh025
I-Ting Hsieh, Chung-Hsien Wu:
Dysarthric Speech Recognition Using Curriculum Learning and Multi-stream Architecture.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SudoF00L025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SudoF00L025
Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Chyi-Jiunn Lin, Shinji Watanabe:
DYNAC: Dynamic Vocabulary-based Non-Autoregressive Contextualization for Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JinHHLLC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinHHLLC25
Hojun Jin, Eunsoo Hong, Ziwon Hyung, Sungjun Lim, Seungjin Lee, Keunseok Cho:
Beyond Hard Sharing: Efficient Multi-Task Speech-to-Text Modeling with Supervised Mixture of Experts.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/00030SCTL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/00030SCTL025
Yifan Peng, Muhammad Shakeel, Yui Sudo, William Chen, Jinchuan Tian, Chyi-Jiunn Lin, Shinji Watanabe:
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning.

Deepfake Detection

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuenYQCL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuenYQCL25
Kwok Chin Yuen, Jia Qi Yip, Zhen Qiu, Chi-Hung Chi, Kwok-Yan Lam:
Bona fide Cross Testing Reveals Weak Spot in Audio Deepfake Detection Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KheirP025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KheirP025
Yassine El Kheir, Tim Polzehl, Sebastian Möller:
BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GlazerCAGF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GlazerCAGF25
Neta Glazer, David Chernin, Idan Achituve, Sharon Gannot, Ethan Fetaya:
Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MullerKCSBPWS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MullerKCSBPWS25
Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Adriana Stan, Aditya Tirumala Bukkapatnam, Karla Pizzi, Alexander Wagner, Philip Sperl:
Replay Attacks Against Audio Deepfake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimSHLKSHJY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimSHLKSHJY25
Seung-bin Kim, Hyun-seo Shin, Jungwoo Heo, Chan-yeong Lim, Kyo-Won Koo, Jisoo Son, Sanghyun Hong, Souhwan Jung, Ha-Jin Yu:
Enhancing Audio Deepfake Detection by Improving Representation Similarity of Bonafide Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangGHLZCWZLW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangGHLZCWZLW25
Mingru Yang, Yanmei Gu, Qianhua He, Yanxiong Li, Peirong Zhang, Yongqiang Chen, Zhiming Wang, Huijia Zhu, Jian Liu, Weiqiang Wang:
Generalizable Audio Deepfake Detection via Hierarchical Structure Learning and Feature Whitening in Poincaré sphere.

Tools for Speech Analysis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JinPA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinPA25
Kun Jin, Siva Penke, Srinivasa Algubelli:
VoiceNet: Multilingual On-Device Phoneme-To-Audio Alignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DewhurstCLAK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DewhurstCLAK25
Maya Dewhurst, Jack Collins, Justin J. H. Lo, Roy Alderton, Sam Kirkham:
Nosey: Open-Source Hardware for Acoustic Nasalance.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TannerSSMK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TannerSSMK25
James Tanner, Morgan Sonderegger, Jane Stuart-Smith, Jeff Mielke, Tyler Kendall:
Automatic classification of stop realisation with wav2vec2.0.

Text Processing and Evaluation for Speech Synthesis 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunR25
Siqi Sun, Korin Richmond:
Acquiring Pronunciation from Speech Audio via Multi-task Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoychowdhuryGSP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoychowdhuryGSP25
Sujoy Roychowdhury, Ranjani H. G., Sumit Soman, Nishtha Paul, Subhadip Bandyopadhyay, Siddhanth Iyengar:
Intelligibility of Text-to-Speech Systems for Mathematical Expressions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VaradhanTSBK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VaradhanTSBK25
Praveen Srinivasa Varadhan, Sherry Thomas, Sai Teja M. S., Suvrat Bhooshan, Mitesh M. Khapra:
The State Of TTS: A Case Study with Human Fooling Rates.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhongLWR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhongLWR25
Jinzuomu Zhong, Suyuan Liu, Dan Wells, Korin Richmond:
Pairwise Evaluation of Accent Similarity in Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LamerisGS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LamerisGS25
Harm Lameris, Joakim Gustafsson, Éva Székely:
VoiceQualityVC: A Voice Conversion System for Studying the Perceptual Effects of Voice Quality in Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuhlmannSWH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuhlmannSWH25
Michael Kuhlmann, Fritz Seebauer, Petra Wagner, Reinhold Haeb-Umbach:
Towards Frame-level Quality Predictions of Synthetic Speech.

Segmental and Tonal Units

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuiKSHWK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuiKSHWK25
C. T. Justine Hui, Jenice Kuzhikombil, Isabella Shields, Hiraia Haami-Wells, Catherine I. Watson, Peter J. Keegan:
Perception of Long and Short Vowel Contrast in Te Reo Māori in Clean and Everyday Listening Environments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HrabanekWH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HrabanekWH25
Patrik Hrabánek, Michaela Watkins, Silke Hamann:
The function of creaky voice in South Korean: A perception study.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuTT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuTT25
Mingxi Lu, Ran Tao, Yujia Tian:
Talker Normalization in Chinese Bilinguals: A Comparative Study.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ariga25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ariga25
Terumichi Ariga:
Coping with segmental-prosodic incongruity in spoken word recognition in Japanese.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WepnerEKS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WepnerEKS25
Saskia Wepner, Lucas Eckert, Gernot Kubin, Barbara Schuppler:
What the Filler? Both ASR Systems and Humans Struggle More With Other Kinds of Disfluencies Than With Filler Particles.

Speech Quality Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OliveiraRLWG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OliveiraRLWG25
Danilo de Oliveira, Julius Richter, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann:
Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WardahSBRLB025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WardahSBRLB025
Wafaa Wardah, Robert P. Spang, Vincent Barriac, Jan Reimes, Anna Llagostera, Jens Berger, Sebastian Möller:
SQ-AST: A Transformer-Based Model for Speech Quality Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KibriaW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KibriaW25
Imran E. Kibria, Donald S. Williamson:
AttentiveMOS: A Lightweight Attention-Only Model forSpeech Quality Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiAL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiAL25
Yu-Fei Shi, Yang Ai, Zhen-Hua Ling:
Universal Preference-Score-based Pairwise Speech Quality Assessment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoqGON25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoqGON25
Enjamamul Hoq, Nikhil Gupta, Danielle Omondi, Ifeoma Nwogu:
FUSE-MOS: Fusion of Speech Embeddings for MOS Prediction with Uncertainty Quantification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangCT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangCT25
Wen-Chin Huang, Erica Cooper, Tomoki Toda:
SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit.

Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangWAMKGR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangWAMKGR25
Haici Yang, Gordon Wichern, Ryo Aihara, Yoshiki Masuyama, Sameer Khurana, François G. Germain, Jonathan Le Roux:
Investigating continuous autoregressive generative speech enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParvathalaM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParvathalaM25
Venkatesh Parvathala, K. Sri Rama Murty:
Dynamic Layer Gating for Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhayeCH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhayeCH25
Saisamarth Rajesh Phaye, Milos Cernak, Andrew Harper:
Model as Loss: A Self-Consistent Training Paradigm.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BeheraEPM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BeheraEPM25
Avishkar Behera, Riya Ann Easow, Venkatesh Parvathala, K. Sri Rama Murty:
Test-Time Training for Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanLLL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanLLL25
Seungu Han, Sungho Lee, Juheon Lee, Kyogu Lee:
Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParvathalaGSM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParvathalaGSM25
Venkatesh Parvathala, Ramesh Gundluru, Sreekanth Sankala, K. Sri Rama Murty:
Exploiting Bispectral Features for Single-Channel Speech Enhancement.

Language Learning and Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kuparinen25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kuparinen25
Olli Kuparinen:
Automatic Dialectal Transcription: An Evaluation on Finnish and Norwegian.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HarmsenHCS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HarmsenHCS25
Wieke Harmsen, Roeland van Hout, Catia Cucchiarini, Helmer Strik:
Can ASR generate valid measures of child reading fluency?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KumarY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarY25
Chowdam Venkata Thirumala Kumar, Chiranjeevi Yarra:
SGED-Probe: Probing E2E ASR decoder and aligner for spoken grammar error detection under three speaking practice conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParikhGCS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParikhGCS25
Aditya Kamlesh Parikh, Cristian Tejedor García, Catia Cucchiarini, Helmer Strik:
Evaluating Logit-Based GOP Scores for Mispronunciation Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KheirIMATAAASEM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KheirIMATAAASEM25
Yassine El Kheir, Omnia Ibrahim, Amit Meghanani, Nada Almarwani, Hawau Olamide Toyin, Sadeen Alharbi, Modar Alfadly, Lamya Alkanhal, Ibrahim Selim, Shehab Elbatal, Salima Mdhaffar, Thomas Hain, Yasser Hifny, Mostafa Shahin, Ahmed Ali:
Towards a Unified Benchmark for Arabic Pronunciation Assessment: Qur'anic Recitation as Case Study.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HsiehCWYLC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HsiehCWYLC25
Wen-Wei Hsieh, Hao-Wei Chi, Kuan-Chen Wang, Ping-Cheng Yeh, Te-Hsin Liu, Chen-Yu Chiang:
OMPAL: Bridging Speech and Learning with an Open-Source Mandarin Pronunciation Assessment Corpus for Global Learners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GengSM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GengSM25
Haopeng Geng, Daisuke Saito, Nobuaki Minematsu:
A Perception-Based L2 Speech Intelligibility Indicator: Leveraging a Rater's Shadowing and Sequence-to-sequence Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhKC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhKC25
Sehyun Oh, Sunhee Kim, Minhwa Chung:
Multimodal and Multitask Learning for Predicting Multiple Scores in L2 English Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieLWYXWW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieLWYXWW25
Jingran Xie, Xiang Li, Hui Wang, Yue Yu, Yang Xiang, Xixin Wu, Zhiyong Wu:
Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhanKKUZGVGK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhanKKUZGVGK25
Nhan Phan, Mikko Kuronen, Maria Kautonen, Riikka Ullakonoja, Anna von Zansen, Yaroslav Getman, Ekaterina Voskoboinik, Tamás Grósz, Mikko Kurimo:
Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish.

Speech Synthesis Paradigms and Methods 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkLKY0S25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkLKY0S25
Hyun Joon Park, Jeongmin Liu, Jin Sob Kim, Jeong Yeol Yang, Sung Won Han, Eunwoo Song:
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengCN0W0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengCN0W0025
Qixi Zheng, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiaofei Wang, Kai Yu, Xie Chen:
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoDZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoDZ25
Changfeng Gao, Zhihao Du, Shiliang Zhang:
Differentiable Reward Optimization for LLM based TTS system.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXXHLX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXXHLX25
Zhipeng Li, Xiaofen Xing, Jingyuan Xing, Hangrui Hu, Heng Lu, Xiangmin Xu:
Long-Context Speech Synthesis with Context-Aware Memory.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLCWCM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLCWCM25
Yike Zhang, Yiming Li, Jie Chen, Qinghua Wu, Songjun Cao, Long Ma:
Monotonic Attention for Robust Text-to-Speech Synthesis in Large Language Model Frameworks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuDLAL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuDLAL25
Ye-Xin Lu, Hui-Peng Du, Fei Liu, Yang Ai, Zhen-Hua Ling:
Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZalkowSLHPD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZalkowSLHPD25
Frank Zalkow, Paolo Sani, Kishor Kayyar Lakshminarayana, Emanuël A. P. Habets, Nicola Pia, Christian Dittmar:
Bridging the Training-Inference Gap in TTS: Training Strategies for Robust Generative Postprocessing for Low-Resource Speakers.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lu0SO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lu0SO25
Chunhui Lu, Xue Wen, Liming Song, Junkwang Oh:
Robust Neural Codec Language Modeling with Phoneme Position Prediction for Zero-Shot TTS.

Spatial Audio and Acoustics 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HartantoSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HartantoSS25
Roland Hartanto, Sakriani Sakti, Koichi Shinoda:
SepVAC: Multitask Learning of Speaker Separation, Speaker Localization, Microphone Array Localization, and Room Acoustic Parameter Estimation in Various Acoustic Conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoCWDTM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoCWDTM25
Junhui Zhao, Hang Chen, Qing Wang, Jun Du, Yanhui Tu, Feng Ma:
TA-RIR: Topology-Aware Neural Modeling of Acoustic Propagation for Room Impulse Response Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimYC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimYC25
Hyun-Soo Kim, Da-Hee Yang, Joon-Hyuk Chang:
Spatially Weighted Contrastive Learning for Robust Sound Source Localization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangXTM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangXTM25
Yiyuan Yang, Shitong Xu, Niki Trigoni, Andrew Markham:
Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuLH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuLH25
De Hu, Shuyao Liu, Yanrong He:
Joint Reference Microphone Selection and Filter Order Determination in Multi-channel Active Noise Control.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaoJH25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaoJH25a
Liang Tao, Maoshen Jia, Yonggang Hu:
Direct-path Relative Harmonic Coefficients Detection for Multi-source Direction-of-Arrival Estimation in Reverberant Environments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuLSH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuLSH25
Junsheng Hu, Shaojie Li, Qintuya Si, De Hu:
D-GAT: Dual Graph Attention Network for Global HRTF Interpolation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuzikCA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuzikCA25
Mateusz Guzik, Giulio Cengarle, Daniel Arteaga:
Deep learning based spatial aliasing reduction in beamforming for audio capture.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZYCDM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZYCDM25
Xiaoming Zhang, Ke-Yue Zhang, Taiping Yao, Songjun Cao, Shouhong Ding, Long Ma:
SonarGuard2: Ultrasonic Face Liveness Detection Based on Adaptive Doppler Effect Feature Extraction.

Text Processing and Evaluation for Speech Synthesis 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhnakaSPY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhnakaSPY25
Hien Ohnaka, Yuma Shirahata, Byeongseon Park, Ryuichi Yamamoto:
Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BergerSR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BergerSR25
Noe Berger, Siqi Sun, Korin Richmond:
Non-Standard Accent TTS Support via Large Multi-Accent Frontend Pronunciation Knowledge Transfer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MakSL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MakSL25
Timothy Shin Heng Mak, King Yiu Suen, Albert Y. S. Lam:
Speech-guided Grapheme-to-Phoneme Conversion for Cantonese Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuLLHZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuLLHZ25
Rui Hu, Xiaolong Lin, Jiawang Liu, Shixi Huang, Zhenpeng Zhan:
Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaguerLLHS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaguerLLHS25
Sébastien Le Maguer, Gwénolé Lecorvé, Damien Lolive, Naomi Harte, Juraj Simko:
Enabling the replicability of speech synthesis perceptual evaluations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiniconiSL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiniconiSL25
Natacha Miniconi, Meysam Shamsi, Anthony Larcher:
When The MOS Predictor Asks For Training Annotation In Cross Lingual/Domain Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SetoguchiA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SetoguchiA25
Ryo Setoguchi, Yoshiko Arimoto:
Assessment of the synthetic quality and controllability of laughing onset in speech-laugh synthesis.

General Topics in ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RossenbachHBGS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RossenbachHBGS25
Nick Rossenbach, Benedikt Hilmes, Leon Brackmann, Moritz Gunz, Ralf Schlüter:
Running Conventional Automatic Speech Recognition on Memristor Hardware: A Simulated Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuPRC0DDXBG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuPRC0DDXBG25
Ke Hu, Krishna C. Puvvada, Elena Rastorgueva, Zhehuai Chen, He Huang, Shuoyang Ding, Kunal Dhawan, Hainan Xu, Jagadeesh Balam, Boris Ginsburg:
Word Level Timestamp Generation for Automatic Speech Recognition and Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinH0SM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinH0SM25
Ju Lin, Yiteng Huang, Ming Sun, Frank Seide, Florian Metze:
Directional Speech Recognition with Full-Duplex Capability.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeSMT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeSMT25
Jiajun He, Naoki Sawada, Koichi Miyazaki, Tomoki Toda:
CMT-LLM: Contextual Multi-Talker ASR Utilizing Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XueTZG025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XueTZG025
Hongfei Xue, Yufeng Tang, Jun Zhang, Xuelong Geng, Lei Xie:
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty.

Acoustic Event Detection and Classification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaylorM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaylorM25
James Taylor, Wolfgang Mack:
Improving Audio Classification by Transitioning from Zero- to Few-Shot.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UeharaTT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UeharaTT25
Kohei Uehara, Ryoichi Takashima, Tetsuya Takiguchi:
Zero-Shot Learning for Acoustic Event Classification Using an Attribute Vector and Conditional GAN.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DaiWZPGG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DaiWZPGG25
Lipeng Dai, Qing Wang, Jie Zhang, Shengyu Peng, Yu Guan, Wu Guo:
Leveraging Multi-Level Features of ATST with Conformer-Based Dual-Branch Network for Sound Event Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KomatsuMI25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KomatsuMI25
Tatsuya Komatsu, Hokuto Munakata, Yuchi Ishikawa:
Leveraging Unlabeled Audio for Audio-Text Contrastive Learning via Audio-Composed Text Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IshikawaNMSKA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IshikawaNMSKA25
Yuchi Ishikawa, Shota Nakada, Hokuto Munakata, Kazuhiro Saito, Tatsuya Komatsu, Yoshimitsu Aoki:
Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FujitaMKLS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FujitaMKLS25
Yusuke Fujita, Tomoya Mizumoto, Atsushi Kojima, Lianbo Liu, Yui Sudo:
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZZH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZZH25
Yawei Wang, Qiaoling Zhang, Yi Zhang, Junyao Hu:
Anomalous Sound Detection Based Feature Fusion and Dual-path Non-linear Independent Components Estimation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/002200S0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/002200S0025
Nan Jiang, Yan Song, Qing Gu, Haoyu Song, Lirong Dai, Ian McLoughlin:
An Effective Anomalous Sound Detection Method Based on Global and Local Attribute Mining.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoangNT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoangNT25
Long-Vu Hoang, Tuan Nguyen, Huy Dat Tran:
Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment.

Keyword Spotting and Retrieval

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghD025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghD025
Anup Singh, Kris Demuynck, Vipul Arora:
Language-Agnostic Speech Tokenizer for Spoken Term Detection with Efficient Retrieval.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghC025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghC025
Akanksha Singh, Yi-Ping Phoebe Chen, Vipul Arora:
H-QuEST: Accelerating Query-by-Example Spoken Term Detection with Hierarchical Indexing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Hu0HZ0Y0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Hu0HZ0Y0025
Ruofan Hu, Yan Xia, Minjie Hong, Jieming Zhu, Bo Chen, Xiaoda Yang, Minghui Fang, Tao Jin:
Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungLJRHC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungLJRHC25
Youngmoon Jung, Yong-Hyeok Lee, Myunghun Jung, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho:
Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002JZC0YHZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002JZC0YHZ025
Minghui Fang, Shengpeng Ji, Jialong Zuo, Xize Cheng, Wenrui Liu, Xiaoda Yang, Ruofan Hu, Jieming Zhu, Zhou Zhao:
GTA: Towards Generative Text-To-Audio Retrieval via Multi-Scale Tokenizer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiLR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiLR25
Changin Choi, Sungjun Lim, Wonjong Rhee:
Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangRSRKT0R25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangRSRKT0R25
Ruochu Yang, Milind Rao, Harshavardhan Sundar, Anirudh Raju, Aparna Khare, Srinath Tankasala, Di He, Venkatesh Ravichandran:
On Retrieval of Long Audios with Complex Text Queries.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LimK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LimK25
Jin-Gyo Lim, Seong-Eun Kim:
SIDC-KWS: Efficient Spiking Inception-Dilated Conformer with Self-Attention for Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaladukhaKARR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaladukhaKARR25
Dzmitry Saladukha, Ivan Koriabkin, Kanstantsin Artsiom, Aliaksei Rak, Nikita Ryzhikov:
Multichannel Keyword Spotting for Noisy Conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuWAP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuWAP25
Pai Zhu, Quan Wang, Dhruuv Agarwal, Kurt Partridge:
LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangPZCPAW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangPZCPAW25
Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang:
GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlamHC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlamHC25
Firoj Alam, Md. Arid Hasan, Shammur Absar Chowdhury:
SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs.

Multimodal Systems

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001025
Sun-Kyung Lee, Jong-Hwan Kim:
CAMER: Contribution-Aware Multimodal Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeMT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeMT25
Jiajun He, Jinyi Mi, Tomoki Toda:
GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanAGBSJMPRB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanAGBSJMPRB25
Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Abu Osama Siddiqui, Sarthak Jain, Priyabrata Mallick, Jaya Sai Kiran Patibandla, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma:
SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuL0LW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuL0LW25
Zehua Liu, Xiaolou Li, Chen Chen, Lantian Li, Dong Wang:
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HannanMNLSN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HannanMNLSN25
Abdul Hannan, Muhammad Arslan Manzoor, Shah Nawaz, Muhammad Irzam Liaqat, Markus Schedl, Mubashir Noman:
PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuHCCGZCLBG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuHCCGZCLBG25
Ke Hu, Ehsan Hosseini-Asl, Chen Chen, Edresson Casanova, Subhankar Ghosh, Piotr Zelasko, Zhehuai Chen, Jason Li, Jagadeesh Balam, Boris Ginsburg:
Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangXZX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangXZX25
Ziqian Wang, Xianjun Xia, Xinfa Zhu, Lei Xie:
U-SAM: An Audio Language Model for Unified Speech, Audio, and Music Understanding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TangKA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TangKA25
Yun Tang, Eesung Kim, Vijendra Raj Apsingekar:
Enhanced Hybrid Transducer and Attention Encoder Decoder with Text Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangL025
Yi Wang, Oli Danyi Liu, Peter Bell:
The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models.

Dysarthric Speech Assessment 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SzekelyMK025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzekelyMK025
Éva Székely, Péter Mihajlik, Máté Soma Kádár, László Tóth:
Voice Reconstruction through Large-Scale TTS Models: Comparing Zero-Shot and Fine-tuning Approaches to Personalise TTS in Assistive Communication.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GrootPKSY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GrootPKSY25
Dimme de Groot, Tanvina Patel, Devendra Kayande, Odette Scharenborg, Zhengjun Yue:
Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimHHK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimHHK25
Minseop Kim, Minsu Han, Seokyoung Hong, Myoung-wan Koo:
Data Augmentation using Speech Synthesis for Speaker-Independent Dysarthria Severity Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MGV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MGV25
Anuprabha M, Krishna Gurugubelli, Anil Kumar Vuppala:
Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiFZWW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiFZWW25
Jingting Li, Keyi Feng, Xinran Zhao, Yan Wang, Su-Jing Wang:
Synthetic Dysarthric Speech: A Supplement, Not a Substitute for Authentic Data in Dysarthric Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HajalHHM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HajalHHM25
Karl El Hajal, Enno Hermann, Sevada Hovsepyan, Mathew Magimai-Doss:
Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech.

Dialect Identification in Different Languages

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GutscherP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GutscherP25
Lorenz Gutscher, Michael Pucher:
Audio-Based Classification and Geographic Regression of Austrian Dialects.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KumarAG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarAG25
Saurabh Kumar, Amartyaveer, Prasanta Kumar Ghosh:
Jointly Improving Dialect Identification and ASR in Indian Languages using Multimodal Feature Fusion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElleuchMEB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElleuchMEB25
Haroun Elleuch, Salima Mdhaffar, Yannick Estève, Fethi Bougares:
ADI-20: Arabic Dialect Identification dataset and models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fischbach0KLF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fischbach0KLF25
Lea Fischbach, Akbar Karimi, Caroline Kleen, Alfred Lameli, Lucie Flek:
Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParsonsBKSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParsonsBKSS25
Phoebe Parsons, Heming Strømholt Bremnes, Knut Kvale, Torbjørn Svendsen, Giampiero Salvi:
Effects of Prosodic Information on Dialect Classification Using Whisper Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbdullahBMK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbdullahBMK25
Badr M. Abdullah, Matthew Baas, Bernd Möbius, Dietrich Klakow:
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification.

Connecting Speech Science and Speech Technology for Children's Speech

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fan0HM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fan0HM25
Xulin Fan, Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain:
Band-Split Self-supervised Mamba for Infant-centered Audio Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BenwayTMPE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BenwayTMPE25
Nina R. Benway, Saba Tabatabaee, Benjamin Munson, Jonathan Preston, Carol Y. Espy-Wilson:
Subtyping Speech Errors in Childhood Speech Sound Disorders with Acoustic-to-Articulatory Speech Inversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EadsKBHPM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EadsKBHPM25
Amanda Eads, Heather Kabakoff, Nina Benway, Elaine Hitchcock, Jonathan L. Preston, Tara McAllister:
PERCEPT-US: A Multimodal American English Child Speech Corpus Specialized for Articulatory Feedback.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KulkarniTHRTM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KulkarniTHRTM25
Ajinkya Kulkarni, Francisco Teixeira, Enno Hermann, Thomas Rolland, Isabel Trancoso, Mathew Magimai-Doss:
Children's Voice Privacy: First Steps and Emerging Challenges.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TabatabaeeLE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TabatabaeeLE25
Saba Tabatabaee, Jing Liu, Carol Y. Espy-Wilson:
FT-Boosted SV: Towards Noise Robust Speaker Verification for English Speaking Classroom Environments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiSXFSNM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiSXFSNM25
Zhonghao Shi, Xuan Shi, Anfeng Xu, Tiantian Feng, Harshvardhan Srivastava, Shrikanth Narayanan, Maja J. Mataric:
Examining Test-Time Adaptation for Personalized Child Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangSWHCC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangSWHCC25
Theo Zhang, Madurya Suresh, Anne Warluamont, Kasia Hitczenko, Alejandrina Cristià, Margaret Cychosz:
Employing self-supervised learning models for cross-linguistic child speech maturity classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AnkitaSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AnkitaSS25
Ankita, Shambhavi, Syed Shahnawazuddin:
On Enhancing the Performance of Children's ASR Task in Limited Data Scenario.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengXSBN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengXSBN25
Tiantian Feng, Anfeng Xu, Xuan Shi, Somer Bishop, Shrikanth Narayanan:
Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuFKBLN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuFKBLN25
Anfeng Xu, Tiantian Feng, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan:
Large Language Models based ASR Error Correction for Child Conversations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KunzeMTECDCL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KunzeMTECDCL25
Tarek Kunze, Marianne Métais, Hadrien Titeux, Lucas Elbert, Joseph Coffey, Emmanuel Dupoux, Alejandrina Cristià, Marvin Lavechin:
Challenges in Automated Processing of Speech from Child Wearables: The Case of Voice Type Classifier.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoGCS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoGCS25
Lingyun Gao, Cristian Tejedor García, Catia Cucchiarini, Helmer Strik:
Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VidalFKR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VidalFKR25
Jazmín Vidal, Luciana Ferrer, Juan Esteban Kamienkowski, Pablo Riera:
Improving Automatic Speech Recognition for Children's Reading Assessment with Disfluency-aware Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RamanR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RamanR25
Sneha Raman, Preeti Rao:
Oral Reading Errors by Grade 3 Children in Indian Schools: A Hindi-English Perspective.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GebauerRKEBO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GebauerRKEBO25
Christopher Gebauer, Lars Rumberg, Lars Köhn, Hanna Ehlert, Edith Beaulac, Jörn Ostermann:
Grammatical Error Detection on Spontaneous Children's Speech Using Iterative Pseudo Labeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoriiTOA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoriiTOA25
Koharu Horii, Naohiro Tawara, Atsunori Ogawa, Shoko Araki:
Why is children's ASR so difficult? Analyzing children's phonological error patterns using SSL-based phoneme recognizers.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MarxMB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MarxMB25
Darline Monika Marx, Marco Matassoni, Alessio Brutti:
Automatic detection of speech sound disorders in German-speaking children: augmenting the data with typically developed speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhadziSKH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhadziSKH25
Edem Ahadzi, Vishwanath Pratap Singh, Tomi Kinnunen, Ville Hautamäki:
Continuous Learning for Children's ASR: Overcoming Catastrophic Forgetting with Elastic Weight Consolidation and Synaptic Intelligence.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RollandA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RollandA25
Thomas Rolland, Alberto Abad:
Exploring Shared-Weight Mechanisms in Transformer and Conformer Architectures for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoseroSCSSKHB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoseroSCSSKHB25
Karen Rosero, Ali N. Salman, Shreeram Suresh Chandra, Berrak Sisman, Cortney Van't Slot, Alex A. Kane, Rami R. Hallac, Carlos Busso:
Advancing Pediatric ASR: The Role of Voice Generation in Disordered Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShankarWZSA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShankarWZSA25
Natarajan Balaji Shankar, Zilai Wang, Kaiyuan Zhang, Mohan Shi, Abeer Alwan:
CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghSK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghSK25
Vishwanath Pratap Singh, Md. Sahidullah, Tomi Kinnunen:
Causal Structure Discovery for Error Diagnostics of Children's ASR.

Brain and Cognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoussaT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoussaT25
Omer Moussa, Mariya Toneva:
Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SharonM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SharonM25
Rini A. Sharon, Hema A. Murthy:
Enhancing Syllabic Recognition via Speech-EEG Phase Analysis and Non-Activity State Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuraisamyRL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuraisamyRL25
Saravanakumar Duraisamy, Maurice Rekrut, Luis A. Leiva:
Functional Connectivity and Hilbert-Based Features for Covert Speech EEG Variability Analysis and Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShamsAMBMM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShamsAMBMM25
Siavash Shams, Richard J. Antonello, Gavin Mischler, Stephan Bickel, Ashesh D. Mehta, Nima Mesgarani:
Neuro2Semantic: A Transfer Learning Framework for Semantic Reconstruction of Continuous Language from Human Intracranial EEG.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IvucicPSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IvucicPSS25
Gabriel Ivucic, Saurav Pahuja, Dashanka De Silva, Tanja Schultz:
Selective Auditory Attention Decoding in Naturalistic Conversations Using EEG-Based Speech Envelope Tracking in Multi-Speaker Environments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Al-RadhiNG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Al-RadhiNG25
Mohammed Salah Al-Radhi, Géza Németh, Branislav Gerazov:
MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction.

Regional, Social and Diachronic Variation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SilveiraAG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SilveiraAG25
Gustavo Silveira, Aviad Albert, Martine Grice:
Probing Prosodic Differences Between Two Regional Varieties of Brazilian Portuguese.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MarchiniS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MarchiniS25
Gilly Marchini, Jeremy Steffman:
Data-driven approaches to pitch modelling in two Mexican Spanish ethnolects: K-means Clustering & GAMMs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PopescuLEV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PopescuLEV25
Anisia Popescu, Lori Lamel, Marc Evrard, Ioana Vasilescu:
Tracking /r/ Deletion: Forced Alignment of Pronunciation Variants and Sociophonetic Insights into Post-Obstruent Final /r/ in French.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BressensdorfGH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BressensdorfGH25
Lilian von Bressensdorf, Pia Greca, Jonathan Harrington:
Agent-based modelling, sound change, and metaphony in Southern Italian varieties of Italo-Romance.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/McGahay25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/McGahay25
John McGahay:
Modeling Vowel System Typology Using Iterated Confusion Minimization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoW25
Bingliang Zhao, Xiyu Wu:
Investigating Glottal Stop Coda Loss During Sound Change of Checked Syllables Based on Speech-EGG Voice Offset Alignment.

Speaker Extraction 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NavonSSGHK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NavonSSGHK25
Aviv Navon, Aviv Shamsian, Yael Segal-Feldman, Neta Glazer, Gil Hetz, Joseph Keshet:
FlowTSE: Target Speaker Extraction with Flow Matching.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Serre0BE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Serre0BE25
Thomas Serre, Mathieu Fontaine, Eric Benhaim, Slim Essid:
MTSE: Multi-Target Speaker Extraction for Conversation Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PadillaWVM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PadillaWVM25
Daniel-José Alcala Padilla, Nils L. Westhausen, Swati Vivekananthan, Bernd T. Meyer:
Location-Aware Target Speaker Extraction for Hearing Aids.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoP025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoP025
Shengkui Zhao, Zexu Pan, Bin Ma:
ClearerVoice-Studio: Bridging Advanced Speech Processing Research and Practical Deployment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuKXW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuKXW25
Cheng Yu, Vahid Ahmadi Kalkhorani, Buye Xu, DeLiang Wang:
Online AV-CrossNet: a Causal and Efficient Audiovisual System for Speech Enhancement and Target Speaker Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KieneggerG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KieneggerG25
Jakob Kienegger, Timo Gerkmann:
Steering Deep Non-Linear Spatially Selective Filters for Weakly Guided Extraction of Moving Speakers in Dynamic Scenarios.

Multimodal Emotion Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunTZ00Z0HXHQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunTZ00Z0HXHQ25
Haoqin Sun, Jingguang Tian, Jiaming Zhou, Hui Wang, Jiabei He, Shiwan Zhao, Xiangyu Kong, Desheng Hu, Xinkang Xu, Xinhui Hu, Yong Qin:
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChandraGLBS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChandraGLBS25
Shreeram Suresh Chandra, Lucas Goncalves, Junchen Lu, Carlos Busso, Berrak Sisman:
EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChochlakisIKH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChochlakisIKH25
Georgios Chochlakis, Turab Iqbal, Woo Hyun Kang, Zhaocheng Huang:
Modality-Agnostic Multimodal Emotion Recognition using a Contrastive Masked Autoencoder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MarkitantovRK025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MarkitantovRK025
Maxim Markitantov, Elena Ryumina, Heysem Kaya, Alexey Karpov:
Multi-Modal Multi-Task Affective States Recognition Based on Label Encoder Fusion.

Conversation, Communication and Interaction 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElmersILK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElmersILK25
Mikey Elmers, Koji Inoue, Divesh Lala, Tatsuya Kawahara:
Triadic Multi-party Voice Activity Projection for Turn-taking in Spoken Dialogue Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PaierlHS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PaierlHS25
Michael Paierl, Martin Hagmüller, Barbara Schuppler:
Continuous prediction of backchannel timing for human-robot interaction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SlomiankaMD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SlomiankaMD25
Valeska Slomianka, Tobias May, Torsten Dau:
Impact of Background Noise on Turn-Taking Dynamics in Triadic Conversations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CharuauH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CharuauH25
Delphine Charuau, Naomi Harte:
Multimodal Dynamics of Hand Gestures and Pauses in Multiparty Interactions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PawlowskiZLWSPH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PawlowskiZLWSPH25
Pawel Pawlowski, Krystian Zawistowski, Wojciech Lapacz, Adam Wiacek, Marcin Skorupa, Sebastien Postansque, Jakub Hoscilowicz:
TinyClick: Single-Turn Agent for Empowering GUI Automation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KawanishiICN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KawanishiICN25
Shoki Kawanishi, Akinori Ito, Yuya Chiba, Takashi Nose:
Improving User Impression of Spoken Dialogue Systems by Controlling Para-linguistic Expression Based on Intimacy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoriKCY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoriKCY25
Kiyotada Mori, Seiya Kawano, Angel F. Garcia Contreras, Koichiro Yoshino:
Dialogue Response Prefetching Based on Semantic Similarity and Prediction Confidence of Language Model.

Multimodal Speech and Language Processing in Healthcare Settings

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KommineniBFKTBL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KommineniBFKTBL25
Aditya Kommineni, Digbalay Bose, Tiantian Feng, So Hyun Kim, Helen Tager-Flusberg, Somer Bishop, Catherine Lord, Sudarsana Kadiri, Shrikanth Narayanan:
Can Multimodal Foundation Models Help Analyze Child-Inclusive Autism Diagnostic Videos?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunKC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunKC25
Jihyun Mun, Sunhee Kim, Minhwa Chung:
A Cascaded Multimodal Framework for Automatic Social Communication Severity Assessment in Children with Autism Spectrum Disorder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TisdaleLPNR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TisdaleLPNR25
Daniel Tisdale, Jackson Liscombe, David Pautler, Michael Neumann, Vikram Ramanarayanan:
Accessible Real-time Eye-gaze Tracking for Neurocognitive Health Assessment: A Multimodal Web-based Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PremananthRBKE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PremananthRBKE25
Gowtham Premananth, Philip Resnik, Sonia Bansal, Deanna L. Kelly, Carol Y. Espy-Wilson:
Multimodal Biomarkers for Schizophrenia: Towards Individual Symptom Severity Estimation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BNSXSBOMCSSMC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BNSXSBOMCSSMC25
Suhas BN, Han-Chin Shing, Lei Xu, Mitch Strong, Jon Burnsky, Jessica Ofor, Jordan R. Mason, Susan Chen, Sundararajan Srinivasan, Chaitanya Shivade, Jack Moriarty, Joseph Paul Cohen:
Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuDLGJS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuDLGJS25
Hongchen Wu, Yao Du, Zirong Li, Yixin Gu, Disha Thotappala Jayaprakash, Li Sheng:
Evaluating Automatic Speech Recognition Pipelines for Mandarin-English Bilingual Child Language Assessment in Telehealth.

Music and Audio Analysis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangSDBG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangSDBG25
Jiawen Huang, Felipe Sousa, Emir Demirel, Emmanouil Benetos, Igor Gadelha:
Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HsiehSTC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HsiehSTC25
Pei-Chin Hsieh, Yih-Liang Shen, Ngoc Son Tran, Tai-Shih Chi:
Tonality-Based Accompaniment-Guided Automatic Singing Evaluation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanGABRB025a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanGABRB025a
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma:
Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ReghunathR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ReghunathR25
Lekshmi Chandrika Reghunath, Rajeev Rajan:
Focal Modulation Network: A Novel Solution for Polyphonic Music Instrument Recognition without Attention and Aggregation Strategy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JingHHHO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JingHHHO25
Jiabo Jing, Ying Hu, Hao Huang, Liang He, Zhijian Ou:
A Joint Network for Singing Melody Extraction from Polyphonic Music with Attention Aggregation and Self-Consistency Training.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuWWLY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuWWLY25
Yuetonghui Xu, Yiwen Wang, Xihong Wu, Xiaobing Li, Feng Yu:
Position also matters! Separating Same Instruments in String Quartet using Timbral and Positional Cues.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangZLWZR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangZLWZR25
Ruoxuan Liang, Xiangjian Zeng, Zhen Liu, Qingqiang Wu, Ruichen Zhang, Le Ren:
WhisperMSS: A Two-Stage Framework for Mandarin Singing Transcription and Segmentation Using Pretrained Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuptaKP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuptaKP25
Rishabh Gupta, MLNS Karthik, Yughendaran Palanivel:
Low Complex IIR Adaptive Hear-Through Ambient Filtering for Overcoming Practical Constraints in Earbuds.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuptaKC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuptaKC25
Rishabh Gupta, MLNS Karthik, Omsrinath Chelamkuri:
Sub-band based Adaptive IIR Algorithm with Biquad Filter Stability Constraints for Feedforward Hear-Through Equalization.

Audio Analysis, Generation and Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianSHX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianSHX25
Jingguang Tian, Haoqin Sun, Xinhui Hu, Xinkang Xu:
Discrete Audio Representations for Automated Audio Captioning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakeuchiNYONH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakeuchiNYONH25
Daiki Takeuchi, Binh Thien Nguyen, Masahiro Yasuda, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada:
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiCBCC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiCBCC25
Ho-Young Choi, Jae-Heung Cho, Pil Moo Byun, Won-Gook Choi, Joon-Hyuk Chang:
Temp4Cap: Temporally-aligned Automated Audio Captioning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnBCC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnBCC25
Seyun Ahn, Pil Moo Byun, Won-Gook Choi, Joon-Hyuk Chang:
Optimizing CLAP Reward with LLM Feedback for Semantically Aligned and Diverse Automated Audio Captioning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeS25
Seung-jae Lee, Paul Hongsuck Seo:
Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangDZCK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangDZCK25
Suqi Zhang, Zheqi Dai, Yongyi Zang, Yin Cao, Qiuqiang Kong:
DiffStereo: End-to-End Mono-to-Stereo Audio Generation with Diffusion Transformer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanamoriOTT0S25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanamoriOTT0S25
Yusuke Kanamori, Yuki Okamoto, Taisei Takano, Shinnosuke Takamichi, Yuki Saito, Hiroshi Saruwatari:
RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LechlerMB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LechlerMB25
Laura Lechler, Chamran Moradi, Ivana Balic:
Crowdsourcing MUSHRA Tests in the Age of Generative Speech Technologies: A Comparative Analysis of Subjective and Objective Testing Methods.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Balasubramanian25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Balasubramanian25
Sivakumar Balasubramanian, Jose Antonio Jimenez Amador, Kaustubh Kalgaonkar, King-Wei Hor, Sriram Srinivasan:
SMARTMOS: Modeling Subjective Audio Quality Evaluation for Real-Time Applications.

Privacy and Anonymization

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengA25
Jiali Cheng, Hadi Amiri:
Speech Unlearning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Liu25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Liu25
Zhe Liu:
Unlearning LLM-Based Speech Recognition Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YaoLCX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YaoLCX25
Jixun Yao, Hexin Liu, Eng Siong Chng, Lei Xie:
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FranzrebDP025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FranzrebDP025
Carlos Franzreb, Arnab Das, Tim Polzehl, Sebastian Möller:
Private kNN-VC: Interpretable Anonymization of Converted Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VauquierSH025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VauquierSH025
Nathalie Vauquier, Brij Mohan Lal Srivastava, Seyed Ahmad Hosseini, Emmanuel Vincent:
Legally validated evaluation framework for voice anonymization.

Language Modeling for Conversational Systems

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HsiaoLCYCL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HsiaoLCYCL25
Chi-Yuan Hsiao, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Wei-Chih Chen, Hung-yi Lee:
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DaoVHAGYLCY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DaoVHAGYLCY25
Alan Dao, Dinh Bach Vu, Huy Hoang Ha, Tuan Le Duc Anh, Shreyas Gopal, Yue Heng Yeo, Warren Keng Hoong Low, Eng Siong Chng, Jia Qi Yip:
Speechless: Speech Instruction Training Without Speech for Low Resource Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MousaviGSR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MousaviGSR25
Pooneh Mousavi, Shubham Gupta, Cem Subakan, Mirco Ravanelli:
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MitrofanovNPMKK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MitrofanovNPMKK25
Anton Mitrofanov, Sergey Novoselov, Tatiana Prisyach, Vladislav Marchevskiy, Arseniy Karelin, Nikita Khmelev, Dmitry Dutov, Stepan Malykh, Igor Agafonov, Aleksandr Nikitin, Oleg Petrov:
Cryfish: On deep audio analysis with Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaiC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaiC25
Long Mai, Julie Carson-Berndsen:
Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianC0SABMSGYY025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianC0SABMSGYY025
Jinchuan Tian, William Chen, Yifan Peng, Jiatong Shi, Siddhant Arora, Shikhar Bharadwaj, Takashi Maekaku, Yusuke Shinohara, Keita Goto, Xiang Yue, Huck Yang, Shinji Watanabe:
OpusLM: A Family of Open Unified Speech Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChienH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChienH25
Jen-Tzung Chien, Po-Chun Huang:
CAPR: Confidence-Aware Prompt Refinement in Large Language Models.

Speech Accessibility Project Challenge

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0003PNCHHJKLMMR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0003PNCHHJKLMMR25
Xiuwen Zheng, Bornali Phukon, Jonghwan Na, Ed Cutrell, Kyu J. Han, Mark Hasegawa-Johnson, Pan-Pan Jiang, Aadhrik Kuila, Colin Lea, Bob MacDonald, Gautam Varma Mantena, Venkatesh Ravichandran, Leda Sari, Katrin Tomanek, Chang D. Yoo, Chris Zwilling:
The Interspeech 2025 Speech Accessibility Project Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GohiderB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GohiderB25
Nada Gohider, Otman Basir:
Towards Inclusive and Fair ASR: Insights from the SAPC Challenge for Optimizing Disordered Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DucorroyR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DucorroyR25
Alexandre Ducorroy, Rachid Riad:
Robust fine-tuning of speech recognition models via model merging: application to disordered speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QuatraKSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QuatraKSS25
Moreno La Quatra, Alkis Koudounas, Valerio Mario Salerno, Sabato Marco Siniscalchi:
Exploring Generative Error Correction for Dysarthric Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Baumann0RB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Baumann0RB25
Ilja Baumann, Dominik Wagner, Korbinian Riedhammer, Tobias Bocklet:
Pathology-Aware Speech Encoding and Data Augmentation for Dysarthric Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002BELNRB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002BELNRB25
Dominik Wagner, Ilja Baumann, Natalie Engert, Seanie Lee, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet:
Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZZQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZZQ25
Shiyao Wang, Jiaming Zhou, Shiwan Zhao, Yong Qin:
A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakahashiHSWK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakahashiHSWK25
Kaito Takahashi, Keigo Hojo, Toshimitsu Sakai, Yukoh Wakabayashi, Norihide Kitaoka:
Fine-tuning Parakeet-TDT for Dysarthric Speech Recognition in the Speech Accessibility Project Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanCLFXHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanCLFXHL25
Tianyi Tan, Xin'an Chen, Xiaohuai Le, Wenzhi Fan, Xianjun Xia, Chuanzeng Huang, Jing Lu:
CBA-Whisper: Curriculum Learning-Based AdaLoRA Fine-Tuning on Whisper for Low-Resource Dysarthric Speech Recognition.

Neural Network Training Methods 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MitsumoriKTM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MitsumoriKTM25
Shunsuke Mitsumori, Sara Kashiwagi, Keitaro Tanaka, Shigeo Morishima:
Cross-lingual Data Selection Using Clip-level Acoustic Similarity for Enhancing Low-resource Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AgrawalKK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AgrawalKK25
Jatin Agrawal, Bramhendra Koilakuntla, Srikanth Konjeti:
Spot and Merge: A Hybrid Context Biasing Approach for Rare Word and Out of Vocabulary Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RatajczakRF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RatajczakRF25
Martin Ratajczak, Jean-Philippe Robichaud, Jennifer Drexler Fox:
Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhPKCS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhPKCS25
Changhan Oh, Kiyoung Park, Jeom-ja Kang, Woo Yong Choi, Hwa Jeon Song:
Improving Cross-Attention based on Positional Alignment during Inference for Robust Long-form Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NovitasariFK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NovitasariFK25
Sashi Novitasari, Takashi Fukuda, Gakuto Kurata:
Improving End-to-end Mixed-case ASR with Knowledge Distillation and Integration of Voice Activity Cues.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuS0K25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuS0K25
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai:
Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR.

Diversity: Age, Sex, Gender, Ethnicity, and More

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinNL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinNL25
Yi Lin, Shumeng Ni, Yangfan Lu:
Age-related changes in multisensory integration of emotions in an audiovisual face-prosody-semantics Stroop task.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeirichS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeirichS25
Melanie Weirich, Adrian P. Simpson:
Investigating effects of sex hormones, cycle phases and age on female fundamental frequency.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RommelHD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RommelHD25
Meike Rommel, Mísa Hejná, Nicole Dehé:
Pre-aspiration in Iceland Is Conditioned by Gender/Sex.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Weilinghoff25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Weilinghoff25
Andreas Weilinghoff:
Transcribing Diverse Voices: Using Whisper for ICE corpora.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Amoniyan25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Amoniyan25
Oluwasegun Amoniyan:
Is it all about race?: A Cross-examination of /s/ in a Multilingual (Nigerian) Context.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MohsinNSAGS025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MohsinNSAGS025
Aarish Shah Mohsin, Mohammad Nadeem, Shahab Saquib Sohail, Tughrul Arslan, Mandar Gogate, Nasir Saleem, Amir Hussain:
Investigating Gender Bias in Text-to-Audio Generation Models.

Anomalous Sound Detection

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang0ZZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang0ZZ025
Dong Wang, Jiqing Han, Tieran Zheng, Guibin Zheng, Yongjun He:
Dual Orthogonality Sub-center Loss for Enhanced Anomalous Sound Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang0ZZ025a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang0ZZ025a
Dong Wang, Jiqing Han, Guibin Zheng, Tieran Zheng, Yongjun He:
Adaptive Across-Subcenter Representation Learning for Imbalanced Anomalous Sound Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuLKBGB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuLKBGB25
Ho-Hsiang Wu, Wei-Cheng Lin, Abinaya Kumar, Luca Bondi, Shabnam Ghaffarzadegan, Juan Pablo Bello:
Towards Few-Shot Training-Free Anomaly Sound Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/002200S0025a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/002200S0025a
Nan Jiang, Yan Song, Qing Gu, Haoyu Song, Lirong Dai, Ian McLoughlin:
Finetune Large Pre-Trained Model Based on Frequency-Wise Multi-Query Attention Pooling for Anomalous Sound Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouHLWWZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouHLWWZ25
Dengjian Zhou, Jianghan Hai, Sijia Liao, Yue Ivan Wu, Kainam Thomas Wong, Xiujuan Zheng:
Acoustic Detection of UAV Abnormality Using One Ground-Based Acoustic Vector Sensor.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0011WYWY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0011WYWY25
Ben Niu, Yangjie Wei, Gang Yang, Yuqiao Wang, Shengling Yu:
StarGAN-Aug: A Cross-domain Fault Audio Generation Method for High-performance Fault Diagnosis of Power Transformers.

Far-field and Robust Speech Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuoLH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuoLH25
Longjie Luo, Lin Li, Qingyang Hong:
SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoWQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoWQ25
Siyi Zhao, Wei Wang, Yanmin Qian:
Lightweight Front-end Enhancement for Robust ASR via Frame Resampling and Sub-Band Pruning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangAAAR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangAAAR25
Yingzhi Wang, Anas Alhmoud, Saad Alsahly, Muhammad Alqurishi, Mirco Ravanelli:
Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnJK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnJK25
Hyebin Ahn, Kangwook Jang, Hoirin Kim:
HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KamoODN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KamoODN25
Naoyuki Kamo, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani:
MOVER: Combining Multiple Meeting Recognition Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PandaK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PandaK25
Ashish Panda, Sunil Kumar Kopparapu:
EmbedAug: An Augmentation Scheme for End-to-End Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FaolainH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FaolainH25
Cathal Ó Faoláin, Andrew Hines:
Attention Models and Auditory Transduction Features for Noise Robustness.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KongHO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KongHO25
Xiangzhu Kong, Hao Huang, Zhijian Ou:
Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoHZCBC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoHZCBC25
Kunlong Zhao, Gongping Huang, Xudong Zhao, Jingdong Chen, Jacob Benesty, Zoran Cvetkovic:
On the Design of a Robust Superdirective Beamformer and Topology Parameter Optimization with Frustum-Shaped Microphone Arrays Featuring Multiple Rings.

Speech Synthesis Paradigms and Methods 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FernandezOBM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FernandezOBM25
Andres Fernandez, Juan Azcarreta Ortiz, Çagdas Bilen, Jesus Monge-Alvarez:
Efficient Neural and Numerical Methods for High-QualityOnline Speech Spectrogram Inversion via Gradient Theorem.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenBEWP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenBEWP25
Jingyi Chen, Ju-Seung Byun, Micha Elsner, Pichao Wang, Andrew Perrault:
Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiNKWC025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiNKWC025
Jeongsoo Choi, Zhikang Niu, Ji-Hoon Kim, Chunhui Wang, Joon Son Chung, Xie Chen:
Accelerating Diffusion-based Text-to-Speech Model Trainingwith Dual Modality Alignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangYZ0CJZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangYZ0CJZ25
Liming Liang, Dongchao Yang, Xianwei Zhuang, Yuxin Xie, Luo Chen, Yuehan Jin, Yuexian Zou:
SpeechSEC: A Unified Multi-Task Framework for Speech Synthesis, Editing, and Continuation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangKCYKJY0WLF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangKCYKJY0WLF25
Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Pin-Jui Ku, Ante Jukic, Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, Szu-Wei Fu:
VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuCMGH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuCMGH25
Yihan Wu, Ruibo Chen, Georgios Milis, Junfeng Guo, Heng Huang:
A Watermark for Auto-Regressive Speech Generation Models.

Keynote 3 - Carol Y. Espy-Wilson: Speech Kinematic Analysis from Acoustics: Scientific, Clinical and Practical Applications

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Espy-Wilson25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Espy-Wilson25
Carol Y. Espy-Wilson:
Speech Kinematic Analysis from Acoustics: Scientific, Clinical and Practical Applications.

Articulatory Analyses

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BirkholzSHYKB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BirkholzSHYKB25
Peter Birkholz, Dominik Schäfer, Patrick Häsner, Jihyeon Yun, Iris Kruppke, Rémi Blandin:
Influence of wall coverings of 3D-printed vocal tract models on measured transfer functions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/McGuireSCHC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/McGuireSCHC25
Paul McGuire, Kye Shibata, Thanh Viet Cao, Feng-fan Hsieh, Yueh-Chin Chang:
Supralaryngeal Kinematics of Implosives in Central Vietnamese: An EMA Study.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SzalayPGPJWSFB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzalayPGPJWSFB25
Tünde Szalay, Michael Proctor, Amelia Gully, Tharinda Piyadasa, Craig T. Jin, David Waddington, Naeim Sanaei, Sheryl Foster, Kirrie J. Ballard:
Lateral Channel Formation in Australian English /l/: Insights from Magnetic Resonance Imaging.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangHC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangHC25
Jing Huang, Feng-fan Hsieh, Yueh-Chin Chang:
Articulatory variations in Apical Vowels in Southwestern Mandarin.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ProctorSPJSGWFB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ProctorSPJSGWFB25
Michael Proctor, Tünde Szalay, Tharinda Piyadasa, Craig T. Jin, Naeim Sanaei, Amelia Gully, David Waddington, Sheryl Foster, Kirrie J. Ballard:
Rhotic Articulation in Australian English: Insights from MRI.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LoSK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LoSK25
Justin J. H. Lo, Patrycja Strycharczuk, Sam Kirkham:
Articulatory Strategy in Vowel Production as a Basis for Speaker Discrimination.

Speech and Audio Analysis and Representation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Har-TuvTA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Har-TuvTA25
Nadav Har-Tuv, Or Tal, Yossi Adi:
PAST: Phonetic-Acoustic Speech Tokenizer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KhuranaKLBNGZHH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KhuranaKLBNGZHH25
Sameer Khurana, Dominik Klement, Antoine Laurent, Dominik Bobos, Juraj Novosad, Peter Gazdik, Ellen Zhang, Zili Huang, Amir Hussein, Ricard Marxer, Yoshiki Masuyama, Ryo Aihara, Chiori Hori, François G. Germain, Gordon Wichern, Jonathan Le Roux:
Factorized RVQ-GAN For Disentangled Speech Tokenization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PepinoRF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PepinoRF25
Leonardo Pepino, Pablo Riera, Luciana Ferrer:
EnCodecMAE: leveraging neural codecs for universal audio representation learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YadavTT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YadavTT25
Sarthak Yadav, Sergios Theodoridis, Zheng-Hua Tan:
AxLSTMs: learning self-supervised audio representations with xLSTMs.

Show and Tell 3: Signal Processing / Multimodal processing

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/WakayamaKMDSOYA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WakayamaKMDSOYA25
Keigo Wakayama, Tomoko Kawase, Takafumi Moriya, Marc Delcroix, Hiroshi Sato, Tsubasa Ochiai, Masahiro Yasuda, Shoko Araki:
Real-time TSE demonstration via SoundBeam with KD.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/LayMG25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LayMG25a
Bunlong Lay, Rostilav Makarov, Timo Gerkmann:
Real-Time Diffusion Buffer for Speech Enhancement On A Laptop.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/BaihaqiCKY25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaihaqiCKY25a
Muhammad Yeza Baihaqi, Angel García Contreras, Seiya Kawano, Koichiro Yoshino:
Co-Speech Motion for Virtual Agents in Dialogue Using LLM-Driven Primitive Action Selection.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/PallalaCPPRERA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PallalaCPPRERA25
Arun Kumar Pallala, Nivedita Chennupati, Balaji Padmanaban, Rakesh Pogula, Uma Subhashini Ravuri, Naveen Ellanki, Harish Rajamani, Naveen Ambati:
TargetVoice: Single Channel Low-Latency Target Speaker Extraction.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/VillalobosW0S25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VillalobosW0S25
Yuni Amaloa Quintero Villalobos, Wafaa Wardah, Sebastian Möller, Robert P. Spang:
Rollback Speech: Smart Feedback Prompts for Lost Utterances in Unstable Online Calls.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/OkamotoK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OkamotoK25
Takuma Okamoto, Michiyo Kono:
Simultaneous Speech Translation Integrated Compact Multiple Sound Spot Synthesis System On A Laptop Carried Out With A Backpack.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/PatapatiTS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PatapatiTS25
Santosh V. Patapati, Aashrith Tatineni, Trisanth Srinivasan:
GenECA: A General-Purpose Framework for Real-Time Adaptive Multimodal Embodied Conversational Agents.

Speech and Voice Disorders 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuCDDC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuCDDC25
Zhou Du, Hang Chen, Huijun Ding, Jun Du, Zhen Chen:
Hybrid Expert Knowledge and Self-Supervised Learning for Diagnostic Modeling of Adductor Spasmodic and Primary Myotonic Dysphonia.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoudounasQCFCSC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoudounasQCFCSC25
Alkis Koudounas, Moreno La Quatra, Gabriele Ciravegna, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli, Sabato Marco Siniscalchi, Elena Baralis:
MVP: Multi-source Voice Pathology detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenCYHTL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenCYHTL25
Chih-Ning Chen, Yu-Lan Chuang, Ming-Jhang Yang, Wei-Cheng Hsu, Yung-An Tsou, Yi-Wen Liu:
Phonetic Posteriorgram-Based Phoneme Selection for Vocal Cord Disorder Classification in Continuous Mandarin Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TienkampAVRBHPS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TienkampAVRBHPS25
Thomas Tienkamp, Fleur van Ast, Roos van der Veen, Teja Rebernik, Raoul Buurke, Nikki Hoekzema, Katharina Polsterer, Hedwig Sekeres, Rob van Son, Martijn Wieling, Max J. H. Witjes, Sebastiaan A. H. J. de Visscher, Defne Abur:
Articulatory clarity and variability before and after surgery for tongue cancer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Miodonska25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Miodonska25
Zuzanna Miodonska:
Hybrid HMM-SVM classifier using frication-based features for detection of non-normative sibilant articulation patterns in Polish children's speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MujtabaM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MujtabaM25
Dena F. Mujtaba, Nihar R. Mahapatra:
Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches.

Neural Network Training Methods 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EomHY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EomHY25
SooHwan Eom, Mark Hasegawa-Johnson, Chang D. Yoo:
SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoLKC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoLKC25
Ye-Eun Ko, Mun-Hak Lee, Dong-Hyun Kim, Joon-Hyuk Chang:
Improving Generalization of End-to-End ASR through Diversity and Independence Regularization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CarvalhoTC0A025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CarvalhoTC0A025
Carlos Carvalho, Jinchuan Tian, William Chen, Yifan Peng, Alberto Abad, Shinji Watanabe:
Exploring Linear Variant Transformers and k-NN Memory Inference for Long-Form ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoriyaMMSM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoriyaMMSM25
Takafumi Moriya, Masato Mimura, Kiyoaki Matsui, Hiroshi Sato, Kohei Matsuura:
Attention-Free Dual-Mode ASR with Latency-Controlled Selective State Spaces.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ng00C25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ng00C25
Dianwen Ng, Kun Zhou, Bin Ma, Eng Siong Chng:
Thinking Fast and Slow: Robust Speech Recognition via Deep Filter-Tuning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhuang0F0WX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhuang0F0WX25
Ziyang Zhuang, Tao Wei, Ming Fang, Ning Cheng, Shaojun Wang, Jing Xiao:
Towards Efficiently Whisper Fine-tuning with Monotonic Alignments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002YZBSN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002YZBSN25
Jingjing Xu, Zijian Yang, Albert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney:
Dynamic Acoustic Model Architecture Optimization in Training for ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangJZJ0Z25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangJZJ0Z25
Xiaocan Zhang, Weiwei Jiang, Guibin Zheng, Chenhao Jing, Jiqing Han, Tieran Zheng:
Knowledge Distillation Method for Pruned RNN-T Models via Pruning Bounds Sharing and Losses Confusion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HannanBNN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HannanBNN25
Abdul Hannan, Alessio Brutti, Shah Nawaz, Mubashir Noman:
An Effective Training Framework for Light-Weight Automatic Speech Recognition Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CarofilisRMKBPV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CarofilisRMKBPV25
Andrés Carofilis, Pradeep Rangappa, Srikanth Madikeri, Shashi Kumar, Sergio Burdisso, Jeena Prakash, Esaú Villatoro-Tello, Petr Motlícek, Bidisha Sharma, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke:
Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering.

Disentanglement of Information for Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MenonGTW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MenonGTW25
Aditya Srinivas Menon, Raj Prakash Gohil, Kumud Tripathi, Pankaj Wasnik:
LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YaoLL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YaoLL25
Shanshan Yao, Dianlong Liu, Tian Li:
SCD-Conformer: Semantic Content Disentanglement for Text-Independent Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VecinoMVBVOPRSF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VecinoMVBVOPRSF25
Biel Tura Vecino, Subhadeep Maji, Aravind Varier, Antonio Bonafonte, Ivan Valles, Michael Owen, Constantinos Papayiannis, Leif Rädel, Grant P. Strimel, Oluwaseyi Feyisetan, Roberto Barra-Chicote, Ariya Rastrow, Volker Leutnant, Trevor Wood:
Universal Semantic Disentangled Privacy-preserving Speech Representation Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MillotPGBD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MillotPGBD25
Carole Millot, Clara Ponchard, Cédric Gendrot, Jean-François Bonastre, Orane Dufour:
Using gender, phonation and age to interpret automatically discovered speech attributes for explainable speaker recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MartinekGB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MartinekGB25
Alicja Martinek, Joanna Gajewska, Ewelina Bartuzi-Trokielewicz:
Do you read me? - flow of speech effect on speaker recognition systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AiBCYLX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AiBCYLX25
Zhiqi Ai, Meixuan Bao, Zhiyong Chen, Zhi Yang, Xinnuo Li, Shugong Xu:
VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin.

Error Correction and Confidence Estimation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YamashitaYKK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YamashitaYKK25
Natsuo Yamashita, Masaaki Yamamoto, Hiroaki Kokubo, Yohei Kawaguchi:
LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RaviTC025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RaviTC025
Nagarathna Ravi, Thishyan Raj T, Ravi Teja Chaganti, Vipul Arora:
ASR Confidence Estimation using True Class Lexical Similarity Score.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkH25
Chanho Park, Thomas Hain:
Semi-Supervised Learning for Automatic Speech Recognition with Word Error Rate Estimation and Targeted Domain Data Selection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NovitasariFK25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NovitasariFK25a
Sashi Novitasari, Takashi Fukuda, Gakuto Kurata:
Voice Activity-based Text Segmentation for ASR Text Denormalization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GyselWVTBLO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GyselWVTBLO25
Christophe Van Gysel, Maggie Wu, Lyan Verwimp, Caglar Tirkaz, Marco Bertola, Zhihong Lei, Youssef Oualil:
Phonetically-Augmented Discriminative Rescoring for Voice Search Error Correction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AttiaDLE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AttiaDLE25
Ahmed Adel Attia, Dorottya Demszky, Jing Liu, Carol Y. Espy-Wilson:
From Weak Labels to Strong Results: Utilizing 5, 000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data.

Training and Scoring Methods for Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunH25
Seongkyu Mun, Jubum Han:
Boundary-Conscious Pruning: Hard Set-Aware Model Compression for Efficient Speaker Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenDWJYCW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenDWJYCW25
Yafeng Chen, Chong Deng, Hui Wang, Yiheng Jiang, Han Yin, Qian Chen, Wen Wang:
Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuSSJD025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuSSJD025
Qing Gu, Yan Song, Haoyu Song, Nan Jiang, Lirong Dai, Ian McLoughlin:
A Domain Robust Pre-Training Method with Local Prototypes for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MasztalskiRZMK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MasztalskiRZMK25
Piotr Masztalski, Michal Romaniuk, Jakub Zak, Mateusz Matuszewski, Konrad Kowalczyk:
Clustering-based Hard Negative Sampling for Supervised Contrastive Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuWHSMLSM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuWHSMLSM25
Yang Liu, Li Wan, Yiteng Huang, Ming Sun, Xinhao Mei, Xubo Liu, Yangyang Shi, Florian Metze:
MASV: Speaker Verification with Global and Local Context Mamba.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AsaliBL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AsaliBL25
Amro Asali, Yehuda Ben-Shimol, Itshak Lapidot:
ATMM-SAGA: Alternating Training for Multi-Module with Score-Aware Gated Attention SASV system.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimPSH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimPSH25
Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han:
Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NamHJPJYC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NamHJPJYC25
Kihyun Nam, Jungwoo Heo, Jee-weon Jung, Gangin Park, Chaeyoung Jung, Ha-Jin Yu, Joon Son Chung:
SEED: Speaker Embedding Enhancement Diffusion Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Cumani25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Cumani25
Sandro Cumani:
A Copula-Based Generative Score-Level Fusion Model for Speaker Verification.

Pathological Speech Analysis 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HermesDBV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HermesDBV25
Anne Hermes, Ivana Didirková, Philipp Buech, Gilles Vannuscorps:
Acoustic similarities, articulatory uniqueness: Speech production mechanisms in individuals with congenital lip paralysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HalpernTRS0AT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HalpernTRS0AT25
Bence Mark Halpern, Thomas Tienkamp, Teja Rebernik, Rob J. J. H. van Son, Martijn Wieling, Defne Abur, Tomoki Toda:
Relationship between objective and subjective perceptual measures of speech in individuals with head and neck cancer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhongJTBL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhongJTBL25
Terry Yi Zhong, Esther Janse, Cristian Tejedor-Garcia, Louis ten Bosch, Martha A. Larson:
Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PierottiB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PierottiB25
Francesco Pierotti, Andrea Bandini:
Multimodal Assessment of Speech Impairment in Amyotrophic Lateral Sclerosis Using Audio-Visual and Machine Learning Approaches.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TalkarKHT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TalkarKHT25
Tanya Talkar, Kan Kawabata, Connor Higgins, Sean Tobyne:
Development and Validation of a Wav2Vec 2.0-Based Cross-Language Methodology for Measurement of Articulatory Precision.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SridharW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SridharW25
Charan Sridhar, Shaomei Wu:
J-j-j-just Stutter: Benchmarking Whisper's Performance Disparities on Different Stuttering Patterns.

Multimodal and Visual Speech Synthesis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengCDLFYXD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengCDLFYXD25
Junjie Zheng, Zihao Chen, Chaofan Ding, Yunming Liang, Yihan Fan, Huan Yang, Lei Xie, Xinhan Di:
MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimK25
Hyung Kyu Kim, Hak Gu Kim:
Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangCC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangCC25
Fang Kang, Yin Cao, Haoyu Chen:
Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kim0CPP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kim0CPP25
Minsu Kim, Pingchuan Ma, Honglie Chen, Stavros Petridis, Maja Pantic:
Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QueR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QueR25
Shumin Que, Anton Ragni:
VisualSpeech: Enhancing Prosody Modeling in TTS Using Video.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangYLLLZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangYLLLZ25
Yifan Liang, Kang Yang, Fangkun Liu, Andong Li, Xiaodong Li, Chengshi Zheng:
LightL2S: Ultra-Low Complexity Lip-to-Speech Synthesis for Multi-Speaker Scenarios.

Lexicon and Grammar

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchouwenaarsR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchouwenaarsR25
Atty Schouwenaars, Esther Ruigendijk:
Processing of grammatical information in cochlear implant simulated speech by German adult listeners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeMB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeMB25
Chengjia Ye, James M. McQueen, Hans Rutger Bosker:
A Gradient Effect of Hand Beat Timing on Spoken Word Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XueZM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XueZM25
Wei Xue, Iuliia Zaitova, Bernd Möbius:
The Effect of Word Predictability on Spoken Cross-Language Intelligibility.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuGGH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuGGH25
Yizhi Liu, Luyuan Geng, Yan Gu, Mengru Han:
Sentence-Final Particles in Mandarin Child-Directed Speech: Frequency and Impact on Speech Rate.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RamMGD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RamMGD25
Ashwin Ram, Marisol Muñoz, Zoi Gkalitsiou, Alexandros G. Dimakis:
Bilingual Speakers Exhibit Cognitive Fatigue: A Speech Disfluencies Case Study on Research Talks.

Noise Reduction and Dereverberation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SharmaRMGK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SharmaRMGK25
Chandra Mohan Sharma, Arnab Kumar Roy, Anupam Mandal, Prasanta Kumar Ghosh, Prasanna Kumar Kr:
Boosting StoRM Convergence with Metric Guidance and Non-uniform State-Sampling for Optimal Dereverberation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lalay0B25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lalay0B25
Louis Lalay, Mathieu Fontaine, Roland Badeau:
Unified Variational and Physics-aware Model for Room Impulse Response Estimation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuanYCH0LZDML25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuanYCH0LZDML25
Kaixuan Luan, Xiaoda Yang, Shile Cai, Ruofan Hu, Minghui Fang, Wenrui Liu, Jialong Zuo, Jiaqi Duan, Yuhang Ma, Junyu Lu:
MelRe: Vision-Based Mel-Spectrogram Restoration.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWLJWL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWLJWL25
Sirui Li, Shuai Wang, Zhijun Liu, Zhongjie Jiang, Yannan Wang, Haizhou Li:
SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimC25
Yunsik Kim, Yoonyoung Chung:
Modality-Specific Speech Enhancement and Noise-Adaptive Fusion for Acoustic and Body-Conduction Microphone Framework.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuL25
De Hu, Qilong Li:
Joint Rate Allocation and Sensor Selection for Speech Enhancement in Wireless Acoustic Sensor Networks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WenV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WenV25
Chuan Wen, Sarah Verhulst:
Individualized speech enhancement for hearing-impaired listeners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Dang0SK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Dang0SK25
Shaoxiang Dang, Li Li, Shogo Seki, Hiroaki Kudo:
First Analyze Then Enhance: A Task-Aware System for Speech Separation, Denoising, and Dereverberation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuZLS025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuZLS025
Yaqi Zhu, Lei Zhou, Hongqing Liu, Liming Shi, Lu Gan:
A Robust Hybrid ACC-PM Approach for Personal Sound Zones.

Neural Network Training Methods and Architectures

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GutierrezLWWCCL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GutierrezLWWCCL25
Fabian Ritter Gutierrez, Yi-Cheng Lin, Jui-Chiang Wei, Jeremy H. M. Wong, Eng Siong Chng, Nancy F. Chen, Hung-yi Lee:
Distilling a speech and music encoder with task arithmetic.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DamianosPP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DamianosPP25
Dimitrios Damianos, Georgios Paraskevopoulos, Alexandros Potamianos:
MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongXZFL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongXZFL25
Jie Song, Wang Xiang, Jian Zhou, Cunhang Fan, Zhao Lv:
REB-former: RWKV-enhanced E-branchformer for Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchusterWBB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchusterWBB25
Jan Schuster, Alexander Wölfel, Fabian Brunner, Christian Bergler:
PredTrAD - Prediction-based Transformer for Anomaly Detection in Multivariate Time Series Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimYK025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimYK025
Jongsuk Kim, Jaemyung Yu, Minchan Kwon, Junmo Kim:
FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MojaradT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MojaradT25
Hamid Mojarad, Kevin Tang:
Automatic Speech Recognition of African American English: Lexical and Contextual Effects.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuenYC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuenYC25
Kwok Chin Yuen, Jia Qi Yip, Eng Siong Chng:
Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MakishimaKYITSO25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MakishimaKYITSO25a
Naoki Makishima, Naotaka Kawata, Taiga Yamane, Mana Ihori, Tomohiro Tanaka, Satoshi Suzuki, Shota Orihashi, Ryo Masumura:
SOMSRED-SVC: Sequential Output Modeling with Speaker Vector Constraints for Joint Multi-Talker Overlapped ASR and Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieLHVLYSRS0M25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieLHVLYSRS0M25
Jiamin Xie, Ju Lin, Yiteng Huang, Tyler Vuong, Zhaojiang Lin, Zhaojun Yang, Peng Su, Prashant Rawat, Sangeeta Srivastava, Ming Sun, Florian Metze:
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition.

Challenges in Speech Data Collection, Curation and Annotation - Part 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenDWXX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenDWXX25
Hang Chen, Jun Du, Qing Wang, Juan Xie, Shi-Fu XIong:
A Study of Real-world Audio-Visual Corpus Design and Production: A Perspective from MISP Challenges.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWZNL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWZNL25
Yuxi Wang, Yikang Wang, Qishan Zhang, Hiromitsu Nishizaki, Ming Li:
VCapAV: A Video-Caption Based Audio-Visual Deepfake Detection Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiotaHKT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiotaHKT25
Sayaka Shiota, Suzuka Horie, Kouta Kanno, Shinnosuke Takamichi:
J-SPAW: Japanese speaker verification and spoofing attacks recorded in-the-wild dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SerrandMB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SerrandMB25
Coralie Serrand, Amira Morsli, Gilles Boulianne:
CommissionsQC: a Québec French Speech Corpus for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoluguriSZMDKHK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoluguriSZMDKHK25
Nithin Rao Koluguri, Monica Sekoyan, George Zelenfroynd, Sasha Meister, Shuoyang Ding, Sofia Kostandian, He Huang, Nikolay Karpov, Jagadeesh Balam, Vitaly Lavrukhin, Yifan Peng, Sara Papi, Marco Gaido, Alessio Brutti, Boris Ginsburg:
Granary: Speech Recognition and Translation Dataset in 25 European Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0003SVABM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0003SVABM25
Hashim Ali, Surya Subramani, Raksha Varahamurthy, Nithin Sai Adupa, Lekha Bollinani, Hafiz Malik:
Collecting, Curating, and Annotating Good Quality Speech deepfake dataset for Famous Figures: Process and Challenges.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangFBMPC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangFBMPC25
Miao Zhang, Aref Farhadipour, Annie Baker, Jiachen Ma, Bogdan Pricop, Eleanor Chodroff:
Quantifying and Reducing Speaker Heterogeneity within the Common Voice Corpus for Phonetic Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZwillingHHRBMKB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZwillingHHRBMKB25
Chris Zwilling, Mark Hasegawa-Johnson, Heather Hodges, Lorraine O. Ramig, Adina Bradshaw, Clarion Mendes, Heejin Kim, Alexandria Barkhimer, Laura Mattie, Meg Dickinson, Shawnise Carter, Marie Moore Channell:
The Speech Accessibility Project: Best Practices for Collection and Curation of Disordered Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YueBPDDSZWLhSRM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YueBPDDSZWLhSRM25
Zhengjun Yue, Mara Barberis, Tanvina Patel, Judith Dineley, Willemijn Doedens, Lottie Stipdonk, Yuanyuan Zhang, Elke De Witte, Erfan Loweimi, Hugo Van hamme, Djaina Satoer, Marina B. Ruiter, Laureano Moro-Velázquez, Nicholas Cummins, Odette Scharenborg:
Challenges and practical guidelines for atypical speech data collection, annotation, usage and sharing: A multi-project perspective.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PeureyLKKGDC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PeureyLKKGDC25
Loann Peurey, Marvin Lavechin, Tarek Kunze, Manel Khentout, Lucas Gautheron, Emmanuel Dupoux, Alejandrina Cristià:
Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangS0A25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangS0A25
Qiongqiong Wang, Hardik B. Sailor, Tianchi Liu, Ai Ti Aw:
Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LahtinenVMR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LahtinenVMR25
Kalle Lahtinen, Einari Vaaras, Liisa Mustanoja, Okko Räsänen:
Investigating Affect Mining Techniques for Annotation Sample Selection in the Creation of Finnish Affective Speech Corpus.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SheikhSASCL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SheikhSASCL025
Zaid Sheikh, Shuichiro Shimizu, Siddhant Arora, Jiatong Shi, Samuele Cornell, Xinjian Li, Shinji Watanabe:
Scalable Spontaneous Speech Dataset (SSSD): Crowdsourcing Data Collection to Promote Dialogue Research.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoWZHLNC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoWZHLNC25
Xiyuan Gao, Bruce Xiao Wang, Meiling Zhang, Shuming Huang, Zhu Li, Shekhar Nayak, Matt Coler:
A Multimodal Chinese Dataset for Cross-lingual Sarcasm Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZGNC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZGNC25
Zhu Li, Yuqing Zhang, Xiyuan Gao, Shekhar Nayak, Matt Coler:
Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection.

Evaluation and Forensic Applications of Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CumaniSBMPR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CumaniSBMPR25
Sandro Cumani, Anna Silnova, Sara Barahona, Ladislav Mosner, Oldrich Plchot, Johan Rohdin:
Analysis of the ABC Classification Backends for NIST SRE24.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MalykhAKKZNMVSK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MalykhAKKZNMVSK25
Stepan Malykh, Alexander Anikin, Nikita Khmelev, Anastasia Korenevskaya, Anastasia Zorkina, Sergey Novoselov, Vladislav Marchevskiy, Vladimir Volokhov, Andrey Shulipa, Alexander Kozlov, Alexander Melnikov, Vasiliy Galyuk, Timur Pekhovskiy:
STCON NIST SRE24 System: Composite Speaker Recognition Solution for Challenging Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeL25
Jaejun Lee, Kyogu Lee:
Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HarringtonHHFWK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HarringtonHHFWK25
Lauren Harrington, Vincent Hughes, Philip Harrison, Paul Foulkes, Jessica Wormald, Finnian Kelly, David van der Vloed:
Variability in performance across four generations of automatic speaker recognition systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ReuterJ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ReuterJ25
Paul M. Reuter, Michael Jessen:
On the influence of language similarity in non-target speaker verification trials.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZuoLHM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZuoLHM25
Ruichen Zuo, Kong Aik Lee, Zilong Huang, Man-Wai Mak:
The Sub-3Sec Problem: From Text-Independent to Text-Dependent Corpus.

Language Resources

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuoLCCTKHLCFRL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuoLCCTKHLCFRL25
Yu-Xiang Luo, Yi-Cheng Lin, Ming-To Chuang, Jia-Hung Chen, I-Ning Tsai, Pei Xing Kiew, Yueh-Hsuan Huang, Chien-Feng Liu, Yu-Chen Chen, Bo-Han Feng, Wenze Ren, Hung-yi Lee:
ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoHN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoHN25
Huy Ba Do, Vy Le-Phuong Huynh, Luan Thanh Nguyen:
ViToSA: Audio-Based Toxic Spans Detection on Vietnamese Speech Utterances.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HavardGLS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HavardGLS25
William N. Havard, Renauld Govain, Benjamin Lecouteux, Emmanuel Schang:
Self-Supervised Models of Speech Processing for Haitian Creole.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlabiLKY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlabiLKY25
Jesujoba O. Alabi, Xuechen Liu, Dietrich Klakow, Junichi Yamagishi:
AfriHuBERT: A self-supervised speech representation model for African languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OngRPAAHSHND25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OngRPAAHSHND25
Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar:
The Faetar Speech Recognition Benchmark.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Santamaria-Jorda25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Santamaria-Jorda25
Jaume Santamaria-Jorda, Pablo Segovia-Martínez, Gonçal V. Garcés Díaz-Munío, Joan Albert Silvestre-Cerdà, Adrià Giménez, Rubén Gaspar Aparicio, René Fernández Sánchez, Jorge Civera, Albert Sanchís, Alfons Juan:
LHCP-ASR: An English Speech Corpus of High-Energy Particle Physics Talks for Narrow-Domain ASR Benchmarking.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GrijalbaTBLS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GrijalbaTBLS25
Lucia Ormaechea Grijalba, Nikos Tsourakis, Pierrette Bouillon, Benjamin Lecouteux, Didier Schwab:
Towards High-Quality LLM-Based Data for French Spontaneous Speech Simplification: an Exo-Refinement Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0005LZWZQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0005LZWZQ25
Xun Gong, Anqi Lv, Wangyou Zhang, Zhiming Wang, Huijia Zhu, Yanmin Qian:
BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GrossmanPDTZSWB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GrossmanPDTZSWB25
Raymond Grossman, Taejin Park, Kunal Dhawan, Andrew Titus, Sophia Zhi, Yulia Shchadilova, Weiqing Wang, Jagadeesh Balam, Boris Ginsburg:
SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParcolletTZD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParcolletTZD25
Titouan Parcollet, Yuan Tseng, Shucong Zhang, Rogier C. van Dalen:
Loquacious Set: 25, 000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaisonSM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaisonSM25
Lucas Maison, Thomas Soulas, Marie-Jean Meurs:
CEREALES : a new dataset of Quebec French accented speech with applications to speech recognition.

Bandwidth Expansion and Diffusion-based Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiangHHTHZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiangHHTHZ25
Yang Xiang, Canan Huang, Desheng Hu, Jingguang Tian, Xinhui Hu, Chao Zhang:
A Semantic Information-based Hierarchical Speech Enhancement Method Using Factorized Codec and Diffusion Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ByunFVM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ByunFVM25
Kyungguen Byun, Jason Filos, Erik Visser, Sunkuk Moon:
Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JunWJK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JunWJK25
Yonghyeon Jun, Beomjun Woo, Myeonghun Jeong, Nam Soo Kim:
SNR-Aligned Consistent Diffusion for Adaptive Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuHZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuHZ25
Nan Xu, Zhaolong Huang, Xiaonan Zhi:
MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuYCH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuYCH25
Xi Liu, Mu Yang, Szu-Jui Chen, John H. L. Hansen:
A Neural Codec Approach for Noise-Robust Bandwidth Expansion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuHZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuHZ25
Xin Liu, Shulin He, Xueliang Zhang:
HWB-Net: A Novel High-Performance and Efficient Hybrid Waveform Bandwidth Extension Method.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaoZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaoZ25
Hongtao Bao, Xueliang Zhang:
Frequency-Domain Enhanced Extreme Bandwidth Extension Network with ICCRN for Superior Speech Quality.

Spoken Language Understanding

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BiswasKI25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BiswasKI25
Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam:
QUADS: Quantized Distillation Framework for Efficient Speech Language Understanding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AgrawalG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AgrawalG25
Neeraj Agrawal, Sriram Ganapathy:
Spoken Language Understanding on Unseen Tasks With In-Context Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LepagnolGGSR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LepagnolGGSR25
Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset:
Leveraging Information Retrieval to Enhance Spoken Language Understanding Prompts in Few-Shot Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0017CYF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0017CYF25
Yi Huang, Si Chen, Jingyu Yao, Junlan Feng:
Modeling Multi-Turn Spoken Language Understanding with Dynamic Graph Convolutional Networks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KumarG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarG25
Ankit Kumar, Munir Georges:
DRI-GAN: A Novel Dual Real Input GAN with Triplet Loss for Cross-Lingual and Noisy SLU.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoudounasQPSB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoudounasQPSB25
Alkis Koudounas, Moreno La Quatra, Eliana Pastor, Sabato Marco Siniscalchi, Elena Baralis:
"KAN you hear me?" Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding.

Multilingual Speech Synthesis and Special Applications 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SankarLTVGK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SankarLTVGK25
Ashwin Sankar, Yoach Lacombe, Sherry Thomas, Praveen Srinivasa Varadhan, Sanchit Gandhi, Mitesh M. Khapra:
Rasmalai : Resources for Adaptive Speech Modeling in IndiAn Languages with Accents and Intonations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PathakG0AM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PathakG0AM25
Utkarsh Pathak, Chandra Sai Krishna Gunda, Anusha Prakash, Keshav Agarwal, Hema A. Murthy:
Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SanchezK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SanchezK25
Ariadna Sanchez, Simon King:
Can We Reconstruct a Dysarthric Voice with the Large Speech Model Parler TTS?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StuckiDC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StuckiDC25
Samuel Stucki, Jan Deriu, Mark Cieliebak:
Voice Adaptation for Swiss German.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LobatoS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LobatoS25
Thiago Henrique Gomes Lobato, Magnus Schäfer:
Gradual modeling of the Lombard effect by modifying speaker embeddings from a Text-To-Speech model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangLLC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangLLC25
Minsu Kang, Seolhee Lee, Choonghyeon Lee, Namhyun Cho:
When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GengFLBWGL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GengFLBWGL25
Yizhong Geng, Wenxin Fu, Qihang Lu, Bingsong Bai, Cong Wang, Yingming Gao, Ya Li:
EEG-based Voice Conversion : Hearing the Voice of Your Brain.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenPAW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenPAW25
Tuan-Nam Nguyen, Ngoc-Quan Pham, Seymanur Akti, Alexander Waibel:
Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LevkovitchSMSBK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LevkovitchSMSBK25
Alon Levkovitch, Julian Salazar, Soroosh Mariooryad, R. J. Skerry-Ryan, Nadav Bar, W. Bastiaan Kleijn, Eliya Nachmani:
Zero-Shot Mono-to-Binaural Speech Synthesis.

Prosody and Voice Quality

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GogoiKLTTSP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GogoiKLTTSP25
Parismita Gogoi, Sishir Kalita, Wendy Lalhminghlui, Viyazonuo Terhiija, Moakala Tzudir, Priyankoo Sarmah, S. R. M. Prasanna:
Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunAWA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunAWA25
Jingyi Sun, Nicolas Audibert, Yaru Wu, Martine Adda-Decker:
Corpus-Based Insights into Mandarin Neutral Tone: Effects of Tonal Context and Structural Patterns in Spontaneous Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChuangW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChuangW25
Yu-Ying Chuang, Sheng-Fu Wang:
Tonal Variation and Word Meaning in Taiwanese.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KakourosC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KakourosC25
Sofoklis Kakouros, Haoyu Chen:
Sounding Like a Winner? Prosodic Differences in Post-Match Interviews.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HavrasMMHM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HavrasMMHM25
Anna Havras, Carlos Mendes, Helena Moniz, Gueorgui Hristovsky, João Miranda:
Exploratory Study of Filled Pauses in Ukrainian Language: Phonetic Properties of Filled Pauses.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PatmanFM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PatmanFM25
Chloe Patman, Paul Foulkes, Kirsty McDougall:
Evaluating the suitability of acoustic parameters for capturing breathy voice in non-pathological female speakers.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WatkinsPBH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WatkinsPBH25
Michaela Watkins, Rasmus Puggaard-Rode, Paul Boersma, Silke Hamann:
Robustness of F0 Ratio as a Diagnostic: Comparing Creaky Voice in Danish and Seoul Korean.

Generative Models for Audio

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CohenWL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CohenWL25
Kfir Cohen, Lior Wolf, Bracha Laufer-Goldshtein:
Discovering Directions of Uncertainty in Speech Inpainting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungKKKC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungKKKC25
Chaeyoung Jung, Hojoon Ki, Ji-Hoon Kim, Junmo Kim, Joon Son Chung:
InfiniteAudio: Infinite-Length Audio Generation with Consistency.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangCJZXYZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangCJZXYZ25
Liming Liang, Luo Chen, Yuehan Jin, Xianwei Zhuang, Yuxin Xie, Yongkang Yin, Yuexian Zou:
FoleyMaster: High-Quality Video-to-Audio Synthesis via MLLM-Augmented Prompt Tuning and Joint Semantic-Temporal Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuGLCY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuGLCY25
Yuchen Hu, Yu Gu, Chenxing Li, Rilin Chen, Dong Yu:
Video-to-Audio Generation with Fine-grained Temporal Semantics.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeYPH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeYPH25
Yuxuan He, Xiaoran Yang, Ningning Pan, Gongping Huang:
TTMBA: Towards Text To Multiple Sources Binaural Audio Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaiXZLWEY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaiXZLWEY25
Jiarui Hai, Yong Xu, Hao Zhang, Chenxing Li, Helin Wang, Mounya Elhilali, Dong Yu:
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer.

Challenges in Speech Data Collection, Curation and Annotation - Part 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaznepogluLASTH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaznepogluLASTH25
Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanuël A. P. Habets, Nils Peters:
You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JoshiGJBNK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JoshiGJBNK25
Sakshi Joshi, Eldho Ittan George, Tahir Javed, Kaushal Bhogale, Nikhil Narasimhan, Mitesh M. Khapra:
Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuGZJQWZL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuGZJQWZL25
Fan Liu, Cheng Gong, Boyu Zhu, Ruihao Jing, Chunyu Qiang, Tianrui Wang, Xiao-Lei Zhang, Xuelong Li:
Augment Mandarin to Cantonese Speech Databases via Retrieval-Augmented Generation and Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JohnsonDPE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JohnsonDPE25
Alexander Johnson, Harsh Deshpande, Emmy Phung, Ahmad Emami:
An Exploratory Framework for LLM-assisted Human Annotation of Speech Datasets.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fathan025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fathan025
Abderrahim Fathan, Jahangir Alam:
Automatic Labeling and Correction of Noisy Labels for Robust Self-Supervised Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0005LXXSSAE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0005LXXSSAE25
Xiangyu Zhang, Daijiao Liu, Tianyi Xiao, Cihan Xiao, Tünde Szalay, Mostafa Shahin, Beena Ahmed, Julien Epps:
Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SzalaySSNHBA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzalaySSNHBA25
Tünde Szalay, Mostafa Shahin, Tharmakulasingam Sirojan, Zheng Nan, Renata Huang, Kirrie J. Ballard, Beena Ahmed:
AusKidTalk: Using Strategic Data Collection and Out-of-Domain Tools to Semi-Automate Novel Corpora Annotation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaiB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaiB25
Rui Cai, Titia Benders:
ASR-based segmentation for the analysis of larger child-speech datasets: Performance evaluation on vowels from Australian-English speaking children aged 4 to 11 years.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Christodoulidou25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Christodoulidou25
Polychronia Christodoulidou, James Tanner, Jane Stuart-Smith, Michael McAuliffe, Mridhula Murali, Amy Smith, Lauren Taylor, Joanne Cleland, Anja Kuschmann:
A semi-automatic pipeline for transcribing and segmenting child speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HirutaYT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HirutaYT25
Komei Hiruta, Yosuke Yamano, Hideaki Tamori:
Hybrid Data Sampling for ASR: Integrating Acoustic Diversity and Transcription Uncertainty.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RavenscroftCBSS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RavenscroftCBSS25
William Ravenscroft, George Close, Kit Bower-Morris, Jamie Stacey, Dmitry Sityaev, Kris Y. Hong:
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BiswasSATB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BiswasSATB25
Astik Biswas, Oleg Shevelev, Amine Abdaoui, Vivek Tyagi, Abdelmoumene Boumadane:
Adapting Whisper for low-resource Hindi-English Code-Mix speech with on-the-fly Augmentation & LLM-Synthesised Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MenaSRMGAZMH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MenaSRMGAZMH25
Carlos Mena, Pol Serra, Jacobo Romero, Abir Messaoudi, José Giraldo, Carme Armentano-Oller, Rodolfo Zevallos, Iván Meza, Javier Hernando:
Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KarpovKTAAYYL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KarpovKTAAYYL25
Nikolay Karpov, Sofia Kostandian, Nune Tadevosyan, Alexan Ayrapetyan, Andrei Andrusenko, Ara Yeroyan, Mher Yerznkanyan, Vitaly Lavrukhin:
From Scarcity to Sufficiency: Speech Recognition Pipeline for Zero-resource Language.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengZS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengZS25
Yifan Cheng, Ruoyi Zhang, Jiatong Shi:
MIKU-PAL: An Automated and Standardized Multimodal Method for Speech Paralinguistic and Affect Labeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangHQWZAPF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangHQWZAPF25
Andrew Chang, Chenkai Hu, Ji Qi, Zhuojian Wei, Kexin Zhang, Viswadruth Akkaraju, David Poeppel, Dustin Freeman:
Multimodal Fusion with Semi-Supervised Learning Minimizes Annotation Quantity for Modeling Videoconference Conversation Experience.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ValenteMTABBSLA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ValenteMTABBSLA25
Ana Rita Valente, Rufael Marew, Hawau Olamide Toyin, Hamdan Al-Ali, Anelise Bohnen, Inma Becerra, Elsa Marta Soares, Gonçalo Leal, Hanan Aldarmaki:
Clinical Annotations for Automatic Stuttering Severity Assessment.

Speech Emotion Recognition 3

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangJWZLGSKLMK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangJWZLGSKLMK25
Wonjune Kang, Junteng Jia, Chunyang Wu, Wei Zhou, Egor Lakomkin, Yashesh Gaur, Leda Sari, Suyoun Kim, Ke Li, Jay Mahadeokar, Ozlem Kalinli:
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaiXCFX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaiXCFX25
Jialong Mai, Xiaofen Xing, Weidong Chen, Yuanbo Fang, Xiangmin Xu:
AA-SLLM: An Acoustically Augmented Speech Large Language Model for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shi0T25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shi0T25a
Xiaohan Shi, Xingfeng Li, Tomoki Toda:
Speaker-Aware Multi-Task Learning for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TamirT0DK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TamirT0DK25
Ziv Tamir, Thomas Thebaud, Jesús Villalba, Najim Dehak, Oren Kurland:
Multimodal Emotion Diarization: Frame-Wise Integration of Text and Audio Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoteNRRB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoteNRRB25
Pravin Mote, Abinay Reddy Naini, Donita Robinson, Elizabeth Richerson, Carlos Busso:
Analysis of Phonetic Level Similarities Across Languages in Emotional Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuQLL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuQLL25
Jiaxi Hu, Leyuan Qu, Haoxun Li, Taihao Li:
Label Semantic-Driven Contrastive Learning for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RyuHKG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RyuHKG25
Minji Ryu, Ji-Hyeon Hur, Sung Heuk Kim, Gahgene Gweon:
Pitch Contour Model (PCM) with Transformer Cross-Attention for Speech Emotion Recognition.

Emotion and Expressivity in Speech Synthesis and Voice Conversion

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XingLCXX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XingLCXX25
Jingyuan Xing, Zhipeng Li, Shuaiqi Chen, Xiaofen Xing, Xiangmin Xu:
EATS-Speech: Emotion-Adaptive Transformation and Priority Synthesis for Zero-Shot Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FujitaHI25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FujitaHI25
Kenichi Fujita, Shota Horiguchi, Yusuke Ijima:
Voice Impression Control in Zero-Shot TTS.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiQHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiQHL25
Haoxun Li, Leyuan Qu, Jiaxi Hu, Taihao Li:
EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoOKL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoOKL25
Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee:
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimCKL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimCKL25
Nam-Gyu Kim, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee:
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MurataMK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MurataMK25
Masato Murata, Koichi Miyazaki, Tomoki Koriyama:
Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXXLX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXXLX25
Xueru Li, Jingyuan Xing, Xiaofen Xing, Zhipeng Li, Xiangmin Xu:
SA-RAS: Speaker-Aware Style Retrieval Augmented Generation for Expressive Zero-Shot Text-to-Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuYYC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuYYC25
Xiaosu Su, Bowen Yang, Xiaowei Yi, Yun Cao:
DiffEmotionVC: A Dual-Granularity Disentangled Diffusion Framework for Any-to-Any Emotional Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChouLSTL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChouLSTL25
Hsing-Hang Chou, Yun-Shao Lin, Ching-Chin Sung, Yu Tsao, Chi-Chun Lee:
ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled Mechanism.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuKCMLY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuKCMLY25
Zhichao Wu, Yueteng Kang, Songjun Cao, Long Ma, Qiulin Li, Qun Yang:
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt.

Streaming ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaLLWSMQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaLLWSMQ25
Yinfeng Xia, Huiyan Li, Chenyang Le, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian:
MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLXLFWX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLXLFWX25
Longhao Li, Yangze Li, Hongfei Xue, Jie Liu, Shuai Fang, Kai Wang, Lei Xie:
Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NamHKL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NamHKL25
Yunjae Nam, Jeong U. Han, Kiyeon Kim, Jaemin Lim:
Parameter-efficient Fine-tuning of Conformer-based Streaming Speech Recognition into Non-streaming Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiSS025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiSS025
Kwanghee Choi, Masao Someki, Emma Strubell, Shinji Watanabe:
On-device Streaming Discrete Speech Units.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouSFSZPWJVRPF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouSFSZPWJVRPF25
Haoran Zhou, Xingchen Song, Brendan Fahy, Qiaochu Song, Binbin Zhang, Zhendong Peng, Anshul Wadhawan, Denglin Jiang, Apurv Verma, Vinay Ramesh, Srivas Prasad, Michele M. Franceschini:
Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoLPNTC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoLPNTC25
Luong Ho, Khanh Le, Vinh Pham, Bao Nguyen, Tan Tran, Duc Chau:
Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization.

L1 and L2 Acquisition, Perception and Processing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DongCHS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DongCHS25
Wenwei Dong, Catia Cucchiarini, Roeland van Hout, Helmer Strik:
Evaluating Progress of CALL System Users on Accentedness and Comprehensibility: An Acoustic and ASR-Based Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TurnbullKP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TurnbullKP25
Rory Turnbull, Elisa Kiefer, Sharon Peperkamp:
Does English fish sound like French fiche? Perceptual similarity judgments versus acoustic similarity.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiHYP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiHYP25
Jinxin Ji, Yiying Hu, Xiaohu Yang, Gang Peng:
Acoustic Features of Mandarin Tone Production in Noise: A Comparison Between Chinese Native Speakers and Korean L2 Learners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoK25
Fengyue Lisa Zhao, Jennifer Kuo:
The Role of Contextual Variation in Learning Cantonese Tones from Naturalistic Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaoZZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaoZZ25
Mengxue Cao, Tianxin Zheng, Jiewen Zheng:
Pitch Target Realization in Putonghua Tone Production of Children from Dialect-Speaking Regions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWGZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWGZ25
Aijun Li, Zhiwei Wang, Jun Gao, Xin Zhou:
The Development of Speech Rhythm in Putonghua-Learning Preschool Children in South Xinjiang Uyghur Autonomous Region of China.

Speech Emotion Recognition 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanAGBPB025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanAGBPB025
Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Jaya Sai Kiran Patibandla, Arun Balaji Buduru, Rajesh Sharma:
PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanGASBRSB025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanGASBRSB025
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Shubham Singh, Swarup Ranjan Behera, Vandana Rajan, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma:
Towards Machine Unlearning for Paralinguistic Speech Processing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouLY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouLY25
Junyu Zhou, Yanxiong Li, Haolin Yu:
Infant Cry Emotion Recognition Using Improved ECAPA-TDNN with Multi-scale Feature Fusion and Attention Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiangYH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiangYH25
Shanshan Xiang, Hankiz Yilahun, Askar Hamdulla:
Speech Mutil-label Emotion Recognition Using Asymmetric Class Loss Function Based on Effective Samples.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BurkhardtSRWDES25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BurkhardtSRWDES25
Felix Burkhardt, Oliver Schrüfer, Uwe D. Reichel, Hagen Wierstorf, Anna Derington, Florian Eyben, Björn W. Schuller:
EmoDB 2.0: A Database of Emotional Speech in a World that is not Black or White but Grey.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouL25
Zhaohui Zhou, Hui Luo:
Cross-corpus open-set Speech Emotion Recognition Method Based on Spatiotemporal Features with Inverse-Entropy Regularization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiZG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiZG25
Jiacheng Shi, Yanfu Zhang, Ye Gao:
CLEP-DG: Contrastive Learning for Speech Emotion Domain Generalization via Soft Prompt Tuning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Pendyala0S25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Pendyala0S25
Varsha Pendyala, Pedro Morgado, William A. Sethares:
Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation.

Speaker Traits Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KirklandE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KirklandE25
Ambika Kirkland, Jens Edlund:
Who knows best? Effects of speech disfluencies on incentivized decision-making.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoBHT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoBHT25
Phuoc Hoang Ho, Dragos Alexandru Balan, Dirk K. J. Heylen, Khiet P. Truong:
Enhancing Transcripts of Open-Source Automatic Speech Recognition Models Through Fine-Tuning with Laughter and Speech-Laugh.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AggarwalMMJWNC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AggarwalMMJWNC25
Pranjal Aggarwal, Ghritachi Mahajani, Pavan Kumar Malasani, Vaibhav Jamadagni, Caroline J. Wendt, Ehsanul Haque Nirjhar, Theodora Chaspari:
Investigating the Reasoning Abilities of Large Language Models for Understanding Spoken Language in Interpersonal Interactions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ArafathCR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArafathCR25
Karumannil Mohamed Ismail Yasar Arafath, Mohammed Abeer K. C., Aurobinda Routray:
A Naturally Elicited Multimodal Stress Database and Speech Breathing Based Stress Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhattacharyaTH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhattacharyaTH25
Debasmita Bhattacharya, Aanya Tolat, Julia Hirschberg:
From Context to Code-switching: Examining the Interplay of Language Proficiency and Multilingualism in Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KponouMLEE25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KponouMLEE25
D. Fortune Kponou, Salima Mdhaffar, Fréjus A. A. Laleye, Eugène C. Ezin, Yannick Estève:
Extending the Fongbe to French Speech Translation Corpus: resources, models and benchmark.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangFLLBN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangFLLBN25
Kevin Huang, Sean Foley, Jihwan Lee, Yoonjeong Lee, Dani Byrd, Shrikanth Narayanan:
On the Relationship between Accent Strength and Articulatory Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PapadimitriouP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PapadimitriouP25
Katerina Papadimitriou, Gerasimos Potamianos:
A Multi-Stream Framework Utilizing 3D Human Reconstruction for Cued Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Niebuhr25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Niebuhr25
Oliver Niebuhr:
On the cross-modal makeup of charisma: Insights from a field-data analysis.

Spoofing and Adversarial Attacks

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DasKFHP025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DasKFHP025
Arnab Das, Yassine El Kheir, Carlos Franzreb, Tim Herzig, Tim Polzehl, Sebastian Möller:
Generalizable Audio Spoofing Detection using Non-Semantic Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SankalaPGM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SankalaPGM25
Sreekanth Sankala, Venkatesh Parvathala, Ramesh Gundluru, K. Sri Rama Murty:
Adversarial Attacks on Text-dependent Speaker Verification System.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChandraTPR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChandraTPR25
Shilpa Chandra, Akansha Tyagi, Shiven Patel, Padmanabhan Rajan:
Beyond Attacks: Advancing Fake Speech Detection with Attack-Agnostic Methods.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeizmanBL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeizmanBL25
Avishai Weizman, Yehuda Ben-Shimol, Itshak Lapidot:
ASVspoof2019 vs. ASVspoof5: Assessment and Comparison.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BukerKBDH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BukerKBDH25
Aykut Büker, Oguzhan Kurnaz, Sule Bekiryazici, Selim Can Demirtas, Cemal Hanilçi:
Evaluating Parameter Sharing for Spoofing-Aware Speaker Verification: A Case Study on the ASVspoof 5 Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuttaRSV025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuttaRSV025
Bikash Dutta, Rishabh Ranjan, Shyam Sathvik, Mayank Vatsa, Richa Singh:
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?

Voice Conversion 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanH0YYZ0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanH0YYZ0025
Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Jianhao Ye, Hongbin Zhou, Lei Ma, Jianjun Zhao:
ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QiWLSYWZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QiWLSYWZ25
Tianhua Qi, Shiyan Wang, Cheng Lu, Tengfei Song, Hao Yang, Zhanglin Wu, Wenming Zheng:
PromptEVC: Controllable Emotional Voice Conversion with Natural Language Prompts.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWNWM0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWNWM0025
Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu:
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanekoKTK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanekoKTK25
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo:
FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation.

Pathological Speech Analysis 3

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PostmaG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PostmaG25
Emmy Postma, Cristian Tejedor García:
Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuXGDWLDWCML25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuXGDWLDWCML25
Shujie Hu, Xurong Xie, Mengzhe Geng, Jiajun Deng, Huimeng Wang, Guinan Li, Chengxi Deng, Tianzi Wang, Mingyu Cui, Helen Meng, Xunying Liu:
On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuLLCCC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuLLCCC25
Jia-Jyu Su, Yen-Ting Lin, Wu-Hao Li, Chao-Kai Chang, Yan-Zhi Chen, Chen-Yu Chiang:
Lightweight Speech Enhancement for Mandarin Esophageal Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002KKPYLLMLB025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002KKPYLLMLB025
Yubin Kim, Taehan Kim, Wonjune Kang, Eugene Park, Joonsik Yoon, Dongjae Lee, Xin Liu, Daniel McDuff, Hyeonhoon Lee, Cynthia Breazeal, Hae Won Park:
VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SalihsWAAACEHTW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SalihsWAAACEHTW25
Sumaya Ahmed Salihs, Isaac Wiafe, Jamal-Deen Abdulai, Elikem Doe Atsakpo, Gifty Ayoka, Richard Cave, Akon Obu Ekpezu, Catherine Holloway, Katrin Tomanek, Fiifi Baffoe Payin Winful:
A Cookbook for Community-driven Data Collection of Impaired Speech in Low-Resource Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NarainKLTYMHMHA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NarainKLTYMHMHA25
Jaya Narain, Vasudha Kowtha, Colin Lea, Lauren Tooley, Dianna Yee, Vikramjit Mitra, Zifang Huang, Miquel Espi Marques, Jon Huang, Carlos Avendaño, Shirley Ren:
Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect.

Speech Emotion Recognition in Naturalistic Conditions Challenge

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenXXGLDWHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenXXGLDWHL25
Youjun Chen, Xurong Xie, Haoning Xu, Mengzhe Geng, Guinan Li, Chengxi Deng, Huimeng Wang, Shujie Hu, Xunying Liu:
Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UniyalA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UniyalA25
Drishya Uniyal, Vinayak Abrol:
From Pretraining to Performance: Benchmarking Self-Supervised Speech Models for Interspeech-25 SER Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengLBN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengLBN25
Tiantian Feng, Thanathai Lertpetchpun, Dani Byrd, Shrikanth Narayanan:
Developing a Top-tier Framework in Naturalistic Conditions Challenge for Categorized Emotion Prediction: From Speech Foundation Models and Learning Objective to Data Augmentation and Engineering Choices.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LertpetchpunFBN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LertpetchpunFBN25
Thanathai Lertpetchpun, Tiantian Feng, Dani Byrd, Shrikanth Narayanan:
Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoOKL25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoOKL25a
Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee:
EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeygueSBBAP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeygueSBBAP25
Tahitoa Leygue, Astrid Sabourin, Christian Bolzmacher, Sylvain Bouchigny, Margarita Anastassova, Quoc-Cuong Pham:
Explainable Speech Emotion Recognition Through Attentive Pooling: Insights from Attention-Based Temporal Localization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuttaBRSG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuttaBRSG25
Soumya Dutta, Smruthi Balaji, Varada R, Viveka Salinamakki, Sriram Ganapathy:
ABHINAYA - A System for Speech Emotion Recognition In Naturalistic Conditions Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NainiGSMUTMGDSB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NainiGSMUTMGDSB25
Abinay Reddy Naini, Lucas Goncalves, Ali N. Salman, Pravin Mote, Ismail Rasim Ulgen, Thomas Thebaud, Laureano Moro-Velázquez, Leibny Paola García, Najim Dehak, Berrak Sisman, Carlos Busso:
The Interspeech 2025 Challenge on Speech Emotion Recognition in Naturalistic Conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JonJJ0MK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JonJJ0MK25
Hyo Jin Jon, Longbin Jin, Hyuntaek Jung, Hyunseo Kim, Donghun Min, Eun Yi Kim:
MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZgorzynskiWMS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZgorzynskiWMS25
Bartlomiej Zgórzynski, Juliusz Wójtowicz-Kruk, Piotr Masztalski, Wladyslaw Sredniawa:
Multi-task learning for speech emotion recognition in naturalistic conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Chatzichristodoulou25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chatzichristodoulou25
Georgios Chatzichristodoulou, Despoina Kosmopoulou, Antonios Kritikos, Anastasia Poulopoulou, Efthymios Georgiou, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos:
Medusa: A Multimodal Deep Fusion Multi-Stage Training Framework for Speech Emotion Recognition in Naturalistic Conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuGLZ0Z25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuGLZ0Z25
Yuyun Liu, Yujia Gu, Jiahao Luo, Wenming Zheng, Cheng Lu, Yuan Zong:
Interactive Fusion of Multi-View Speech Embeddings via Pretrained Large-Scale Speech Models for Speech Emotional Attribute Prediction in Naturalistic Conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiM0T25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiM0T25
Xiaohan Shi, Jinyi Mi, Xingfeng Li, Tomoki Toda:
Advancing Emotion Recognition via Ensemble Learning: Integrating Speech, Context, and Text Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UedaLMC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UedaLMC25
Lucas H. Ueda, João Lima, Leonardo Marques, Paula Dornhofer Paro Costa:
Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Singh025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Singh025
Prabhav Singh, Jesús Villalba:
EmoJudge: LLM Based Post-Hoc Refinement for Multimodal Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TzengSWCL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TzengSWCL25
Jing-Tong Tzeng, Bo-Hao Su, Ya-Tse Wu, Hsing-Hang Chou, Chi-Chun Lee:
Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangDMZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangDMZ25
Honghong Wang, Jing Deng, Fanqin Meng, Rong Zheng:
Enhancing Speech Emotion Recognition with Multi-Task Learning and Dynamic Feature Fusion.

Prosody, Phoneme and Stress Modeling in ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoshaSA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoshaSA25
Iddo Yosha, Dorin Shteyman, Yossi Adi:
WhiStress: Enriching Transcriptions with Sentence Stress Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WallbridgeML025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WallbridgeML025
Sarenne Wallbridge, Christoph Minixhofer, Catherine Lai, Peter Bell:
Prosodic Structure Beyond Lexical Content: A Study of Self-Supervised Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PortesH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PortesH25
David Portes, Ales Horák:
Learning Optimal Prosody Embedding Codebook based on F0 and Energy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SasuS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SasuS25
David Sasu, Natalie Schluter:
Pitch Accent Detection improves Pretrained Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouLCPLLOEVMBW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouLCPLLOEVMBW25
Xuanru Zhou, Jiachen Lian, Cheol Jun Cho, Tejas S. Prabhune, Shuhe Li, William Li, Rodrigo Ortiz, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli:
Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GibsonG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GibsonG25
Louise Coppieters de Gibson, Philip N. Garner:
Exploring auditory feedback mechanisms in speech recognition.

Segments

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HutinLF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HutinLF25
Mathilde Hutin, Mélanie Lancien, Noam Faust:
French schwa is not acoustically distinct from its two lexical neighbors /ø/ and /œ/.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunSA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunSA25
Jingyi Sun, Bowei Shao, Martine Adda-Decker:
Apical vs. Regular Vowel Duration: A Corpus-based Analysis of Contextual Influences in Standard Mandarin.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangH25
Xuying Wang, Fang Hu:
On Apical Vowels in Eastern Zhenjiang Mandarin.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BuechHR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BuechHR25
Philipp Buech, Anne Hermes, Rachid Ridouane:
Equivalence and differences: Formant patterns of labialization and pharyngealization in Tashlhiyt.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangQ25
Yifan Yang, Zhiheng Qian:
Temporal organization of prenuclear glides in Hefei Mandarin.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kwon25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kwon25
Chloe D. Kwon:
Speaker-specific Patterns of Phonetic Covariation in Korean Word-medial Stops and the Role of Phonological and Morphological Contexts.

Datasets and Tools for Speech Synthesis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LangmanYNHCBL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LangmanYNHCBL25
Ryan Langman, Xuesong Yang, Paarth Neekhara, Shehzeen Hussain, Edresson Casanova, Evelina Bakhturina, Jason Li:
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KondoKTK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KondoKTK25
Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko:
JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SerajianRVH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SerajianRVH25
Mina Serajian, Saeed Najafzadeh Rahaghi, Hadi Veisi, Saman Haratizadeh:
FaVC: A Validated, Transcribed, Parallel Farsi Speech Dataset for Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LodagalaAIMCMHH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LodagalaAIMCMHH25
Vasista Sai Lodagala, Lamya Alkanhal, Daniel Izham, Shivam Mehta, Shammur Absar Chowdhury, Aqeelah Makki, Hamdy S. Hussein, Gustav Eje Henter, Ahmed Ali:
SawtArabi: A Benchmark Corpus for Arabic TTS. Standard, Dialectal and Code-Switching.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungZMW0KMUTSEC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungZMW0KMUTSEC25
Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas W. D. Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe:
The Text-to-speech in the Wild (TITW) Database.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0008GXSB025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0008GXSB025
Rui Liu, Pu Gao, Jiatian Xi, Berrak Sisman, Carlos Busso, Haizhou Li:
Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ToyinMAMA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ToyinMAMA25
Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki:
ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WelfordP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WelfordP25
Alejandro Sosa Welford, Leonardo Pepino:
A Dataset for Automatic Assessment of TTS Quality in Spanish.

Spoken Dialogue Systems 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HegdeKSSYPTC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HegdeKSSYPTC25
Pradyoth Hegde, Santosh Kesiraju, Jan Svec, Simon Sedlácek, Bolaji Yusuf, Oldrich Plchot, Deepak K. T, Jan Cernocký:
Factors affecting the in-context learning abilities of LLMs for dialogue state tracking.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShabtayKDAHA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShabtayKDAHA25
Nimrod Shabtay, Zvi Kons, Avihu Dekel, Hagai Aronowitz, Ron Hoory, Assaf Arbelle:
Spoken Question Answering for Visual Queries.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimJSK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimJSK25
Taesoo Kim, Yongsik Jo, Hyunmin Song, Taehwan Kim:
Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AroraTFJSKT025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AroraTFJSKT025
Siddhant Arora, Jinchuan Tian, Hayato Futami, Jee-weon Jung, Jiatong Shi, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
Chain-of-Thought Training for Open E2E Spoken Dialogue Systems.

Speech Enhancement and Representation Learning

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuanLCZLC025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuanLCZLC025
Xihao Yuan, Siqi Liu, Yan Chen, Hang Zhou, Chang Liu, Hanting Chen, Jie Hu:
SaD: A Scenario-Aware Discriminator for Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChungC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChungC25
Soo-Whan Chung, Min-Seok Choi:
Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunDNWZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunDNWZ025
Xingwei Sun, Heinrich Dinkel, Yadong Niu, Linzhang Wang, Junbo Zhang, Jian Luan:
Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GogateD025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GogateD025
Mandar Gogate, Kia Dashtipour, Amir Hussain:
Towards Personalised Audio Visual Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLZZLCXWX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLZZLCXWX25
Ziqian Wang, Zikai Liu, Xinfa Zhu, Yike Zhu, Mingshuai Liu, Jun Chen, Longshuai Xiao, Chao Weng, Lei Xie:
FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeCHKS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeCHKS25
Seonggyu Lee, Sein Cheong, Sangwook Han, Kihyuk Kim, Jong Won Shin:
Speech Enhancement based on cascaded two flows.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangDNLCZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangDNLCZ025
Junbo Zhang, Heinrich Dinkel, Yadong Niu, Chenyu Liu, Si Cheng, Anbei Zhao, Jian Luan:
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaserTKCV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaserTKCV25
Oguzhan Baser, Ahmet Ege Tanriverdi, Kaan Kale, Sandeep Chinchali, Sriram Vishwanath:
WavShape: Information-Theoretic Speech Representation Learning for Fair and Privacy-Aware Audio Processing.

Neural Codecs and Vocoders

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengTKCZ00M25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengTKCZ00M25
Youqiang Zheng, Weiping Tu, Yueteng Kang, Jie Chen, Yike Zhang, Li Xiao, Yuhong Yang, Long Ma:
FreeCodec: A Disentangled Neural Speech Codec with Fewer Tokens.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLLHWWZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLLHWWZ025
Jiaqi Li, Xiaolong Lin, Zhekai Li, Shixi Huang, Yuancheng Wang, Chaoren Wang, Zhenpeng Zhan, Zhizheng Wu:
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoneyamaKTYT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoneyamaKTYT25
Reo Yoneyama, Masaya Kawamura, Ryo Terashima, Ryuichi Yamamoto, Tomoki Toda:
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoWW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoWW25
Junchuan Zhao, Xintong Wang, Ye Wang:
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanekoKTK25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanekoKTK25a
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo:
Vocoder-Projected Feature Discriminator.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenXLSH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenXLSH25
Zhuangqi Chen, Xianjun Xia, Xiaohuai Le, Siyu Sun, Chuanzeng Huang:
AF-Vocoder: Artifact-Free Neural Vocoder with Global Artifact Filter.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenGWWHHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenGWWHHL25
Peijie Chen, Wenhao Guan, Kaidi Wang, Weijie Wu, Hukai Huang, Qingyang Hong, Lin Li:
DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech Codec.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakagiNHHNT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakagiNHHNT25
Masato Takagi, Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
PeriodCodec: A Pitch-Controllable Neural Audio Codec Using Periodic Signals for Singing Voice Synthesis.

Adaptation and Target-speaker ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SeongCJKC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SeongCJKC25
Ju-Seok Seong, Jeong-Hwan Choi, Ye-Rin Jeoung, Ilseok Kim, Joon-Hyuk Chang:
Enhancing Target-speaker Automatic Speech Recognition Using Multiple Speaker Embedding Extractors with Virtual Speaker Embedding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HiranoS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HiranoS25
Yuta Hirano, Sakriani Sakti:
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RangappaCPKBMVS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RangappaCPKBMVS25
Pradeep Rangappa, Andrés Carofilis, Jeena Prakash, Shashi Kumar, Sergio Burdisso, Srikanth R. Madikeri, Esaú Villatoro-Tello, Bidisha Sharma, Petr Motlícek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke:
Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DengXHGJZDLCWXC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DengXHGJZDLCWXC25
Chengxi Deng, Xurong Xie, Shujie Hu, Mengzhe Geng, Yicong Jiang, Jiankun Zhao, Jiajun Deng, Guinan Li, Youjun Chen, Huimeng Wang, Haoning Xu, Mingyu Cui, Xunying Liu:
MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangJYFXZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangJYFXZ25
Zhao Yang, Rui Jiang, Yue Heng Yeo, Xiao Fu, Wei Xi, Jizhong Zhao:
Visually-Adaptive Guided Robust Speech Recognition with Parameter-Efficient Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VietingKHSN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VietingKHSN25
Peter Vieting, Maximilian Kannen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney:
Regularizing Learnable Feature Extraction for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FangXLZX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FangXLZX25
Yuanbo Fang, Xiaofen Xing, Xueru Li, Weibin Zhang, Xiangmin Xu:
MMLoRA: Multitask Memory Parameter-Efficient Fine-Tuning for Multimodal SER.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DalenZPB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DalenZPB25
Rogier C. van Dalen, Shucong Zhang, Titouan Parcollet, Sourav Bhattacharya:
Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes.

Show and Tell 4: Education / Assistive Technology

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/RomanPCPNTF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RomanPCPNTF25
Javier Román, Pol Pastells, Mauro Vázquez Chas, Clara Puigventós, Montserrat Nofre, Mariona Taulé, Mireia Farrús:
SCRIBAL: A Digital Transcription Tool in Higher Education.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FrancisGS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FrancisGS25
Juliana Francis, Joakim Gustafsson, Éva Székely:
From Static to Dynamic: Enhancing AAC with Generative Imagery and Zero-Shot TTS.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/MerzouguiTMDHBM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MerzouguiTMDHBM25
Dhia Eddine Merzougui, Nilesh Tete, Fabrice Maurel, Gaël Dias, Mohammed Hasanuzzaman, Aurélien Bournonville, Edgar Madelaine, Thomas Berthelin Le Tellier, François Ledoyen, Laure Poutrain-Lejeune, François Rioult, Jérémie Pantin:
Concurrent Speech and Auditory Tag Clouds for Non-Visual Web Interaction.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/LiljaZAGEF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiljaZAGEF25
Alex Peiró Lilja, Rodolfo Zevallos, Carme Armentano-Oller, José Giraldo, Cristina España-Bonet, Mireia Farrús:
Towards Domain-Specific Spoken Language Understanding for a Catalan Voice-Controlled Video Game.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/McAllisterTEHCS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/McAllisterTEHCS25
Tara McAllister, Peter Traver, Amanda Eads, William Haack, Helen Carey, Yi Shan, Wendy Liang, Tae Hong Park:
Accessible Delivery of Visual-Acoustic Biofeedback for Speech Sound Disorder.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/RajuK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RajuK25
Giri Raju, Sandeep Konam:
End-to-End Indian Language Dubbing with Zero-Shot Speaker Preservation.

Source Separation 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Yang0TZL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Yang0TZL25
Junqi Yang, Yuhong Yang, Weiping Tu, Xin Zhao, Cedar Lin:
Band-SCNet: A Causal, Lightweight Model for High-Performance Real-Time Music Source Separation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanHFZJCX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanHFZJCX25
Runduo Han, Yanxin Hu, Yihui Fu, Zihan Zhang, Yukai Jv, Li Chen, Lei Xie:
CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-car Speech Separation with Distributed Heterogeneous Arrays.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeHS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeHS25
Geonyoung Lee, Geonhee Han, Paul Hongsuck Seo:
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangSY25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangSY25a
Xue Yang, Guiru Shen, Yu Yang:
Cross-Attention-Based Target Sound Extraction by Fully Leveraging Enrollment in a Shared Latent Space.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HasumiF25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HasumiF25
Takuya Hasumi, Yusuke Fujita:
DnR-nonverbal: Cinematic Audio Source Separation DatasetContaining Non-Verbal Sounds.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ItaniGEG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ItaniGEG25
Malek Itani, Ashton Graves, Sefik Emre Eskimez, Shyamnath Gollakota:
Neural Speech Extraction with Human Feedback.

Speech Coding

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangGLHCY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangGLHCY25
Hanglei Zhang, Yiwei Guo, Zhihan Li, Xiang Hao, Xie Chen, Kai Yu:
Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WenWZSC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WenWZSC25
Liang Wen, Lizhong Wang, Yuxing Zheng, Weijing Shi, Kwang Pyo Choi:
SPCODEC: Split and Prediction for Neural Speech Codec.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsengH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsengH25
Wei-Cheng Tseng, David Harwath:
Probing the Robustness Properties of Neural Speech Codecs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoLDWCY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoLDWCY25
Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu:
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SadokHB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SadokHB25
Samir Sadok, Julien Hauret, Éric Bavu:
Bringing Interpretability to Neural Audio Codecs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CasanovaNLHGYJL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CasanovaNLHGYJL25
Edresson Casanova, Paarth Neekhara, Ryan Langman, Shehzeen Hussain, Subhankar Ghosh, Xuesong Yang, Ante Jukic, Jason Li, Boris Ginsburg:
NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference.

Multimodality

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaranRSB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaranRSB25
Matteo Maran, Renske Rötjes, Anna R. E. Schreurs, Hans Rutger Bosker:
Beat gestures made by human-like avatars affect speech perception.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OneataNMK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OneataNMK25
Dan Oneata, Leanne Nortje, Yevgen Matusevych, Herman Kamper:
The mutual exclusivity bias of bilingual visually grounded speech models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkJJ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkJJ25
Kyeongman Park, Seongho Joo, Kyomin Jung:
MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuWWM025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuWWM025
Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li:
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MenezesMFKB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MenezesMFKB25
João Menezes, Aubin Mouras, Arne-Lukas Fietkau, Dani Kazzy, Peter Birkholz:
Multimodal Silent Recognition of Phonemes Using Radar and Optopalatographic Silent Speech Interfaces.

Speech Assessment and Language Learning

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SirigirajuY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SirigirajuY25
Meenakshi Sirigiraju, Chiranjeevi Yarra:
GoP2Vec: A few shot learning for pronunciation assessment with goodness of pronunciation (GoP) based representations from an i-vector framework and augmentation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParikhTCS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParikhTCS25
Aditya Kamlesh Parikh, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik:
Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhCK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhCK25
Sehyun Oh, Minhwa Chung, Sunhee Kim:
Multilingual Speech Assessment Using Cross-Attention and Multitask Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ma0TBKG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ma0TBKG25
Rao Ma, Mengjie Qian, Siyuan Tang, Stefano Bannò, Kate M. Knill, Mark J. F. Gales:
Assessment of L2 Oral Proficiency using Speech Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QianMBKG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QianMBKG25
Mengjie Qian, Rao Ma, Stefano Bannò, Kate M. Knill, Mark J. F. Gales:
Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiCLBLKYKKK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiCLBLKYKKK25
Muyeol Choi, HyunJung Choi, Yohan Lim, Jeong-Uk Bang, Minkyu Lee, Seon Hui Kim, Seung Yun, Donghyun Kim, Minsoo Kim, Sanghun Kim:
Bidirectional Spoken-Written Text Conversion with Large Language Models.

Watermarking and Anonymization

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuYCWWYGL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuYCWWYGL25
Yaoxun Xu, Jianwei Yu, Hangting Chen, Zhiyong Wu, Xixin Wu, Dong Yu, Rongzhi Gu, Yi Luo:
WAKE: Watermarking Audio with Key Enrichment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinYCWSL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinYCWSL25
Yu-Sheng Lin, Ching-Yu Yang, Hsing-Hang Chou, Ya-Tse Wu, Bo-Hao Su, Chi-Chun Lee:
Defend for Self-Vocoding: A Novel Enhanced Decoder Network for Watermark Recovery.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimPCS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimPCS25
Minyoung Kim, Sehwan Park, Sungmin Cha, Paul Hongsuck Seo:
Cross-Modal Watermarking for Authentic Audio Recovery and Tamper Localization in Synthesized Audiovisual Forgeries.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWXXXP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWXXXP25
Haiyun Li, Zhiyong Wu, Xiaofeng Xie, Jingran Xie, Yaoxun Xu, Hanyang Peng:
VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OzerCSS0M25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OzerCSS0M25
Yigitcan Özer, Woosung Choi, Joan Serrà, Mayank Kumar Singh, Wei-Hsiang Liao, Yuki Mitsufuji:
A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZengR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZengR25
Xijie Zeng, Frank Rudzicz:
How to Recover Long Audio Sequences Through Gradient Inversion Attack With Dynamic Segment-based Reconstruction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeyerKV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeyerKV25
Sarina Meyer, Ekaterina Kolos, Ngoc Thang Vu:
First Steps Towards Voice Anonymization for Code-Switching Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Tomashenko0T25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Tomashenko0T25
Natalia A. Tomashenko, Emmanuel Vincent, Marc Tommasi:
Exploiting Context-dependent Duration Features for Voice Anonymization Attack Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangHWMY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangHWMY25
Zhe Zhang, Wen-Chin Huang, Xin Wang, Xiaoxiao Miao, Junichi Yamagishi:
Mitigating Language Mismatch in SSL-Based Speaker Anonymization.

Single-channel Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParvathalaM25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParvathalaM25a
Venkatesh Parvathala, K. Sri Rama Murty:
MSFNet: A Nested Model for Multi-Sampling-Frequency Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiHBZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiHBZ25
Zixuan Li, Shulin He, Jinglin Bai, Xueliang Zhang:
TF-SkiMNet: Speech Enhancement Based on Inplace Modeling and Skipping Memory in Time-Frequency Domain.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuhneO0T25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuhneO0T25
Nikolai Lund Kühne, Jan Østergaard, Jesper Jensen, Zheng-Hua Tan:
xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiHCSLC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiHCSLC25
Haoyang Li, Yuchen Hu, Chen Chen, Sabato Marco Siniscalchi, Songting Liu, Eng Siong Chng:
From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimSKP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimSKP25
Jangyeon Kim, Ui-Hyeop Shin, Jaehyun Ko, Hyung-Min Park:
Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimKC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimKC25
Se-Ha Kim, Tae-Gyeong Kim, Chang-Jae Chun:
Mamba-based Hybrid Model for Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoSWLZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoSWLZ25
Yu Zhao, Zengqiang Shang, Mou Wang, Xin Liu, Pengyuan Zhang:
Restoring Harmonics: Enhancing Speech Quality with Deep Mask and Harmonic Restoration Network.

Contextual Biasing and Adaptation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KongCGDFZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KongCGDFZ025
Yuxiang Kong, Fan Cui, Liyong Guo, Heinrich Dinkel, Lichun Fan, Junbo Zhang, Jian Luan:
GLCLAP: A Novel Contrastive Learning Pre-trained Model for Contextual Biasing in ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NakagomeH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NakagomeH25
Yu Nakagome, Michael Hentschel:
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Hou0ZWQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Hou0ZWQ25
Haoxiang Hou, Xun Gong, Wangyou Zhang, Wei Wang, Yanmin Qian:
Ranking and Selection of Bias Words for Contextual Bias Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SudoFKML25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SudoFKML25
Yui Sudo, Yusuke Fujita, Atsushi Kojima, Tomoya Mizumoto, Lianbo Liu:
OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangPSN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangPSN25
Zijian Yang, Minh-Nghia Phan, Ralf Schlüter, Hermann Ney:
Label-Context-Dependent Internal Language Model Estimation for CTC.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZevallosGSMLH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZevallosGSMLH25
Rodolfo Zevallos, Martí Cortada Garcia, Sarah Solito, Carlos Mena, Alex Peiró Lilja, Javier Hernando:
Assessing the Performance and Efficiency of Mamba ASR in Low-Resource Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangPHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangPHL25
Hongli Yang, Yizhou Peng, Hao Huang, Sheng Li:
Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning.

Speaker Diarization 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoriguchiADAT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoriguchiADAT25
Shota Horiguchi, Takanori Ashihara, Marc Delcroix, Atsushi Ando, Naohiro Tawara:
Mitigating Non-Target Speaker Bias in Guided Speaker Embedding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWLLLLH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWLLLLH25
Zhaoyang Li, Jie Wang, XiaoXiao Li, Wangjie Li, Longjie Luo, Lin Li, Qingyang Hong:
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BroughtonS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BroughtonS25
Samuel J. Broughton, Lahiru Samarakoon:
Pushing the Limits of End-to-End Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Cord-LandwehrGD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Cord-LandwehrGD25
Tobias Cord-Landwehr, Tobias Gburrek, Marc Deegen, Reinhold Haeb-Umbach:
Spatio-Spectral Diarization of Meetings by Combining TDOA-based Segmentation and Speaker Embedding-based Clustering.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangCFL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangCFL25
Hongyu Zhang, Ming Cheng, Jing Feng, Ming Li:
Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KaldaPAB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KaldaPAB25
Joonas Kalda, Clément Pagés, Tanel Alumäe, Hervé Bredin:
Diarization-Guided Multi-Speaker Embeddings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MedennikovPWHDW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MedennikovPWHDW25
Ivan Medennikov, Taejin Park, Weiqing Wang, He Huang, Kunal Dhawan, Jinhan Wang, Jagadeesh Balam, Boris Ginsburg:
Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimGFCBW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimGFCBW25
Bongjun Kim, Arindam Ghosh, Mark C. Fuhs, Anurag Chowdhury, Deblin Bagchi, Monika Woszczyna:
A Hybrid Approach to Combining Role Diarization with ASR for Professional Conversations.

Depression Detection and Assessment 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DengLG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DengLG25
Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia:
An interpretable speech foundation model for depression detection by revealing prediction-relevant acoustic features from long speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gomez-ZaragozaM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gomez-ZaragozaM25
Lucía Gómez-Zaragozá, Javier Marín-Morales, Mariano Alcañiz, Mohammad Soleymani:
Speech and Text Foundation Models for Depression Detection: Cross-Task and Cross-Language Evaluation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MajiSNMGRS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MajiSNMGRS25
Bubai Maji, Monorama Swain, Shazia Nasreen, Debabrata Majumdar, Rajlakshmi Guha, Aurobinda Routray, Anders Søgaard:
A Study on The Impact of Foundation Models on Automatic Depression Detection from Speech Signals.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JuliaLFGLSVWTPD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JuliaLFGLSVWTPD25
Nelson Hidalgo Julia, Robert Lewis, Craig Ferguson, Simon Goldberg, Wendy Lau, Caroline Swords, Gabriela Valdivia, Christine D. Wilson-Mendenhall, Raquel Tartar, Rosalind Picard, Richard Davidson:
Identifying Vocal and Facial Biomarkers of Depression in Large-Scale Remote Recordings: A Multimodal Study Using Mixed-Effects Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YouWGW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YouWGW25
Jiajun You, Shuai Wang, Xun Gong, Xiang Wan:
M3L: A Multi-Modal and Multi-Lingual Depression Detection Framework.

Keynote 4 - Judith Holler: Using and comprehending language in face-to-face conversation

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Holler25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Holler25
Judith Holler:
Using and comprehending language in face-to-face conversation.

Pathological Speech Analysis 4

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gimeno-GomezSPM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gimeno-GomezSPM25
David Gimeno-Gómez, Rubén Solera-Ureña, Anna Pompili, Carlos D. Martínez-Hinarejos, Rita Cardoso, Isabel Guimarães, Joaquim J. Ferreira, Alberto Abad:
On the Relevance of Clinical Assessment Tasks for the Automatic Detection of Parkinson's Disease Medication State from Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HovsepyanM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HovsepyanM25
Sevada Hovsepyan, Mathew Magimai-Doss:
Speech power spectra: a window into neural oscillations in Parkinson's disease.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Escobar-Grisales25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Escobar-Grisales25
Daniel Escobar-Grisales, Cristian David Ríos-Urrego, Sabato Marco Siniscalchi, Adolfo M. García, Yamile Bocanegra, Leonardo Moreno, Elmar Nöth, Juan Rafael Orozco-Arroyave:
Synchronous analysis of abnormal acoustic and linguistic production in Parkinson's speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PetersBTHSJTBC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PetersBTHSJTBC25
Fritz Peters, W. Richard Bevan-Jones, Grace Threlfall, Jenny M. Harris, Julie S. Snowden, Matthew Jones, Jennifer C. Thompson, Daniel J. Blackburn, Heidi Christensen:
Automatic Detection and Sub-typing of Primary Progressive Aphasia from Speech: Integrating Task-Specific Features and Spatio-Semantic Graphs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KommagouniKNBV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KommagouniKNBV25
Priyanka Kommagouni, Pragya Khanna, Vamshiraghusimha Narasinga, Anirudh Bocha, Anil Kumar Vuppala:
Towards Classification of Typical and Atypical Disfluencies: A Self Supervised Representation Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiyaharaKT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiyaharaKT25
Genzo Miyahara, Tsuneo Kato, Akihiro Tamura:
Stuttering Detection Based on Self-Attention Weights of Temporal Acoustic Vector Sequence.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunCK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunCK25
Jihyun Mun, Minhwa Chung, Sunhee Kim:
Speech-Based Automatic Chronic Kidney Disease Diagnosis via Transformer Fusion of Glottal and Spectrogram Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FranzGMB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FranzGMB25
Sven Franz, Tanja Grewe, Bernd T. Meyer, Jörg Bitzer:
Influence of Room Acoustics on Objective Voice Assessment Methods in the Context of Speech and Language Therapy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KothareNR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KothareNR25
Hardik Kothare, Michael Neumann, Vikram Ramanarayanan:
Multimodal Speech-Based Biomarkers Outperform the ALS Functional Rating Scale in Predicting Individual Disease Progression in ALS.

Speech Deepfakes

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimKCL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimKCL25
Taewoo Kim, Guisik Kim, Choongsang Cho, Young Han Lee:
Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TranLGSDM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TranLGSDM25
Hoan My Tran, Damien Lolive, David Guennec, Aghilas Sini, Arnaud Delhay, Pierre-François Marteau:
Leveraging SSL Speech Features and Mamba for Enhanced DeepFake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuGWYTW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuGWYTW25
Chia-Hua Wu, Wanying Ge, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang:
A Comparative Study on Proactive and Passive Detection of Deepfake Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaserTVC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaserTVC25
Oguzhan Baser, Ahmet Ege Tanriverdi, Sriram Vishwanath, Sandeep Chinchali:
PhonemeFake: Redefining Deepfake Realism with Language-Driven Segmental Manipulation and Adaptive Bilevel Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangLWYQ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangLWYQ25
Wen Huang, Xuechen Liu, Xin Wang, Junichi Yamagishi, Yanmin Qian:
From Sharpness to Better Generalization for Speech Deepfake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CombeiSOMC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CombeiSOMC25
David Combei, Adriana Stan, Dan Oneata, Nicolas M. Müller, Horia Cucu:
Unmasking real-world audio deepfakes: A data-centric approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GrinbergKKB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GrinbergKKB25
Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj:
A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangTZD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangTZD25
You Zhang, Baotong Tian, Lin Zhang, Zhiyao Duan:
PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FebrinantoMTMS025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FebrinantoMTMS025
Falih Gozi Febrinanto, Kristen Moore, Chandra Thapa, Jiangang Ma, Vidya Saikrishna, Feng Xia:
Rehearsal with Auxiliary-Informed Sampling for Audio Deepfake Detection.

Prosody

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenSHL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenSHL25
Shuwen Chen, Qingke Sun, Yue Huang, Yingyi Luo:
The Prosodic Characteristics of Standard Chinese Rhetorical Questions in Naturalistic Settings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MondalBVCY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MondalBVCY25
Anindita Mondal, Rahul Biju, Anil Kumar Vuppala, Reni K. Cherian, Chiranjeevi Yarra:
ProBiEM: Acoustic and Lexical Correlates of Prosodic Prominence in English-Malayalam Bilingual Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FunfgeldBZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FunfgeldBZ25
Sophia Fünfgeld, Angelika Braun, Katharina Zahner-Ritter:
Are You Being Sarcastic? Prosodic Cues to Irony Perception in German.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0006ZJSY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0006ZJSY25
Zilong Wang, Xiaoxue Zhang, Xinyang Jiang, Kaitao Song, Jue Yu:
Can AI Understand Mandarin Speech Prosody? A Framework and Benchmark Showcase.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShimYTKLWY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShimYTKLWY25
Ha Eun Shim, Olivia Yung, Paige Tuttösí, Boey Kwan, Angelica Lim, Yue Wang, H. Henny Yeung:
Generating Consistent Prosodic Patterns from Open-Source TTS Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VlasenkoM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VlasenkoM25
Bogdan Vlasenko, Mathew Magimai-Doss:
Multimodal Prosody Modeling: A Use Case for Multilingual Sentence Mode Prediction.

Speech Analysis and Quality Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HusseinKWGR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HusseinKWGR25
Amir Hussein, Sameer Khurana, Gordon Wichern, François G. Germain, Jonathan Le Roux:
HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SOXCM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SOXCM25
Ka Ki SO, Chenzi Xu, Grace Wenling Cao, Peggy Mok:
Performance of Montreal Forced Aligner on Cantonese Spontaneous Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SandersLR025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SandersLR025
Nicholas Sanders, Yuanchao Li, Korin Richmond, Simon King:
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoPZD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoPZD25
Yang Xiao, Tianyi Peng, Yanghao Zhou, Rohan Kumar Das:
AdaKWS: Towards Robust Keyword Spotting with Test-Time Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CumlinLURSC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CumlinLURSC25
Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Schüldt, Saikat Chatterjee:
Multivariate Probabilistic Assessment of Speech Quality.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhmedZS0W025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhmedZS0W025
Shafique Ahmed, Ryandhimas E. Zezario, Nasir Saleem, Amir Hussain, Hsin-Min Wang, Yu Tsao:
A Study on Speech Assessment with Visual Cues.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NilssonMRLPZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NilssonMRLPZ25
Mattias Nilsson, Riccardo Miccini, Julian Rossbroich, Clément Laroche, Tobias Piechowiak, Friedemann Zenke:
Efficient Streaming Speech Quality Prediction with Spiking Neural Networks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuYYT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuYYT25
Cheng-Hung Hu, Yusuke Yasuda, Akifumi Yoshimoto, Tomoki Toda:
Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition.

Emotions and Foundational Models

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuLZG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuLZG25
Hongfei Du, Sidi Lu, Gang Zhou, Ye Gao:
EAA: Emotion-Aware Audio Large Language Models with Dual Cross-Attention and Context-Aware Instruction Tuning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaiXLX25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaiXLX25
Jialong Mai, Xiaofen Xing, Yangbiao Li, Xiangmin Xu:
Chain-of-Thought Distillation with Fine-Grained Acoustic Cues for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoraisASHDKS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoraisASHDKS25
Edmilson da Silva Morais, Hagai Aronowitz, Aharon Satt, Ron Hoory, Avihu Dekel, Brian Kingsbury, George Saon:
Exploring the Limits of Conformer CTC-Encoder for Speech Emotion Recognition using Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HalimWJD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HalimWJD25
Jule Valendo Halim, Siyi Wang, Hong Jia, Ting Dang:
Token-Level Logits Matter: A Closer Look at Speech Foundation Models for Ambiguous Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RautPBBC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RautPBBC25
Ankush Raut, Projna Paromita, Sydney R. Begerowski, Suzanne T. Bell, Theodora Chaspari:
Assessing the feasibility of Large Language Models for detecting micro-behaviors in team interactions during space missions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaoPNMNCC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaoPNMNCC25
Yi-Wen Chao, Yizhou Peng, Dianwen Ng, Yukun Ma, Chongjia Ni, Eng Siong Chng, Eng Siong Chng:
A-SMiLE: Affective Sparse Mixture-of-Experts Adapter with Multi-Task Learning for Spoken Dialogue Models.

Prediction and Evaluation of Speech Quality and Intelligibility

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YamamotoM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YamamotoM25
Katsuhiko Yamamoto, Koichi Miyazaki:
Non-Intrusive Binaural Speech Intelligibility Prediction Using Mamba for Hearing-Impaired Listeners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouMCLW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouMCLW25
Haoshuai Zhou, Changgeng Mo, Boxuan Cao, Linkai Li, Shan Xiang Wang:
No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZezarioS0WT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZezarioS0WT25
Ryandhimas E. Zezario, Sabato Marco Siniscalchi, Fei Chen, Hsin-Min Wang, Yu Tsao:
Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BashirWEC025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BashirWEC025
Aymen Bashir, Haolan Wang, Amin Edraki, Wai-Yip Chan, Jesper Jensen:
Intelligibility Prediction for Time-Modified Speech Signals Using Spectro-Temporal Modulation Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JoubaudHZB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JoubaudHZB25
Thomas Joubaud, Julien Hauret, Véronique Zimpfer, Éric Bavu:
French Listening Tests for the Assessment of Intelligibility, Quality, and Identity of Body-Conducted Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeschanowskyLRB25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeschanowskyLRB25
Anna Leschanowsky, Kishor Kayyar Lakshminarayana, Anjana Rajasekhar, Lyonel Behringer, Ibrahim Kilinc, Guillaume Fuchs, Emanuël A. P. Habets:
Benchmarking Neural Speech Codec Intelligibility with SITool.

Multi-Talker ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DaiWLZWXXGZBC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DaiWLZWXXGZBC25
Yuhang Dai, He Wang, Xingchen Li, Zihan Zhang, Shuiyuan Wang, Lei Xie, Xin Xu, Hongxiao Guo, Shaoji Zhang, Hui Bu, Wei Chen:
AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangPMWDHKBG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangPMWDHKBG25
Weiqing Wang, Taejin Park, Ivan Medennikov, Jinhan Wang, Kunal Dhawan, He Huang, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SakumaSSKKO25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SakumaSSKKO25
Asahi Sakuma, Hiroaki Sato, Ryuga Sugano, Tadashi Kumano, Yoshihiko Kawai, Tetsuji Ogawa:
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Subramanian0K0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Subramanian0K0025
Aswin Shanmugam Subramanian, Amit Das, Naoyuki Kanda, Jinyu Li, Xiaofei Wang, Yifan Gong:
Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios.

Speech Synthesis Paradigms and Methods 3

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeePCLKC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeePCLKC25
Joun Yeop Lee, Sangjun Park, Byoung Jin Choi, Ji-Hyun Lee, Min-Kyung Kim, Hoon-Young Cho:
Efficient Streaming TTS Acoustic Model with Depthwise RVQ Decoding Strategies in a Mamba Framework.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonLLC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonLLC25
Hyungchan Yoon, Chanwoo Lee, Hoodong Lee, Stanley Jungkyu Choi:
APTTS: Adversarial Post-training in Latent Flow Matching for Fast and High-fidelity Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MurataMKT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MurataMKT25
Masato Murata, Koichi Miyazaki, Tomoki Koriyama, Tomoki Toda:
Eigenvoice Synthesis based on Model Editing for Speaker Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunR25a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunR25a
Wanli Sun, Anton Ragni:
Score-Based Training for Energy-Based TTS Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinZYYLWHY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinZYYLWHY25
Zijian Lin, Yang Zhang, Yougen Yuan, Yuming Yan, Jinjiang Liu, Zhiyong Wu, Pengfei Hu, Qun Yu:
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KawamuraHSY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KawamuraHSY25
Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto:
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing.

Biosignal-enabled Spoken Communication

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PahujaI0S0S25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PahujaI0S0S25
Saurav Pahuja, Gabriel Ivucic, Siqi Cai, Dashanka De Silva, Haizhou Li, Tanja Schultz:
GTAnet: Geometry-Guided Temporal Attention for EEG-Based Sound Source Tracking in Cocktail Party Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lin0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lin0025
Zheyuan Lin, Siqi Cai, Haizhou Li:
Decoding Listener's Identity: Person Identification from EEG Signals Using a Lightweight Spiking Transformer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KhandayELOG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KhandayELOG25
Owais Mujtaba Khanday, Pablo Rodríguez San Esteban, Zubair Ahmad Lone, Marc Ouellet, Jose A. Gonzalez-Lopez:
Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RastogiDMBS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RastogiDMBS25
Sparsh Rastogi, Harsh Dadwal, Khushboo Modi, Jatin Bedi, Jasmeet Singh:
Towards Sentence Level Imagined Speech Generation from EEG signals.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangPNMG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangPNMG25
Jingya Huang, Aashish N. Patel, Sowmya Manojna Narasimha, Gal Mishne, Vikash Gilja:
Word-Level Error Analysis in Decoding Systems: From Speech Recognition to Brain-Computer Interfaces.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Silva0PS025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Silva0PS025
Dashanka De Silva, Siqi Cai, Saurav Pahuja, Tanja Schultz, Haizhou Li:
NeuroSpex+: Dual-Task Training of Neuro-Guided Speaker Extraction with Speech Envelope and Waveform.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ScheckDRW0S25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ScheckDRW0S25
Kevin Scheck, Tom Dombeck, Zhao Ren, Peter Wu, Michael Wand, Tanja Schultz:
DiffMV-ETS: Diffusion-based Multi-Voice Electromyography-to-Speech Conversion using Speaker-Independent Speech Training Targets.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IbrahimovZG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IbrahimovZG25
Ibrahim Ibrahimov, Csaba Zainkó, Gábor Gosztolya:
Conformer-based Ultrasound-to-Speech Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/McGheeGK25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/McGheeGK25
Charles McGhee, Mark J. F. Gales, Kate M. Knill:
Training Articulatory Inversion Models for Interspeaker Consistency.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BandekarG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BandekarG25
Jesuraj Bandekar, Prasanta Kumar Ghosh:
Enhancing Acoustic-to-Articulatory Inversion with Multi-Target Pretraining for Low-Resource Settings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TeplanskyRLKCW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TeplanskyRLKCW25
Kristin Teplansky, Emily Rangel, Mimi LaValley, Jinuk Kwon, Beiming Cao, Jun Wang:
Articulatory Vowel Distinctiveness in Spanish.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiCW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiCW25
Peiran Li, Fei Chen, Xixin Wu:
EEG-based Speech Decoding Based on Multi-mode Joint Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/InoueSTNHAHS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/InoueSTNHAHS25
Masakazu Inoue, Motoshige Sato, Kenichi Tomeoka, Nathania Nah, Eri Hatakeyama, Kai Arulkumaran, Ilya Horiguchi, Shuntaro Sasai:
A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shah0G25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shah0G25
Neil Shah, Shirish Karande, Vineet Gandhi:
NAM-to-Speech Conversion with Multitask-Enhanced Autoregressive Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhamTPN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhamTPN25
Long-Khanh Pham, Thanh V. T. Tran, Minh-Tan Pham, Van Nguyen:
RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling.

Speech Deepfakes, Antispoofing and Backdoor Attacks

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoV25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoV25
Yixuan Xiao, Ngoc Thang Vu:
Layer-Wise Decision Fusion for Fake Audio Detection Using XLS-R.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RanjanPV025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RanjanPV025
Rishabh Ranjan, Kishan Pipariya, Mayank Vatsa, Richa Singh:
SynHate: Detecting Hate Speech in Synthetic Deepfake Audio.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MahapatraUNBS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MahapatraUNBS25
Aurosweta Mahapatra, Ismail Rasim Ulgen, Abinay Reddy Naini, Carlos Busso, Berrak Sisman:
Can Emotion Fool Anti-spoofing?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhuongHT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhuongHT25
Tuan Dat Phuong, Long-Vu Hoang, Huy Dat Tran:
Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TrachuLC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TrachuLC25
Thanapat Trachu, Thanathai Lertpetchpun, Ekapol Chuangsuwanich:
Amplifying Artifacts with Speech Enhancement in Voice Anti-spoofing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UraiBC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UraiBC25
Ticho Urai, Pachara Boonsarngsuk, Ekapol Chuangsuwanich:
Thai Speech Spoofing Detection Dataset with Variations in Speaking Styles.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangRZY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangRZY25
Yuheng Huang, Ying Ren, Wenjie Zhang, Diqun Yan:
CBA: Backdoor Attack on Deep Speech Classification via Audio Compression.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiYXYXW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiYXYXW25
Zexin Li, Wenhan Yao, Ye Xiao, Jinsu Yang, Fen Xiao, Weiping Wen:
LRBA: Stealthy Backdoor Attacks on Speech Classification via Latent Rearrangement in VITS.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GorthiTR0V25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GorthiTR0V25
Nidheesh Gorthi, Kartik Thakral, Rishabh Ranjan, Richa Singh, Mayank Vatsa:
LitMAS: A Lightweight and Generalized Multi-Modal Anti-Spoofing Framework for Biometric Security.

Pathological Speech Analysis 5

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BraunWELHBR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BraunWELHBR25
Franziska Braun, Christopher Witzl, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer:
Pitfalls and Limits in Automatic Dementia Assessment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002HMZLWM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002HMZLWM25
Jiawen Kang, Dongrui Han, Lingwei Meng, Jingyan Zhou, Jinchao Li, Xixin Wu, Helen Meng:
On the Within-class Variation Issue in Alzheimer's Disease Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001F25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001F25
Yongqi Shao, Tao Fang:
Alzheimer's Disease Detection Using Co-Attention Mechanism for Acoustic and ASR-Transcribed Text Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuFCWYL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuFCWYL25
Yin-Long Liu, Rui Feng, Jia-Xin Chen, Yi-Ming Wang, Jia-Hong Yuan, Zhen-Hua Ling:
Beyond Manual Transcripts: The Potential of Automated Speech Recognition Errors in Improving Alzheimer's Disease Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HwangKRL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HwangKRL25
Injune Hwang, Jung-Min Kim, Ju Seok Ryu, Kyogu Lee:
Voice-Based Dysphagia Detection: Leveraging Self-Supervised Speech Representation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoFDM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoFDM25
Kunxiao Gao, Anna Favaro, Najim Dehak, Laureano Moro-Velázquez:
ADCeleb: A Longitudinal Speech Dataset from Public Figures for Early Detection of Alzheimer's Disease.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TamWWCPC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TamWWCPC25
Johnny Tam, Christine Weaver, Oliver Watts, Siddharthan Chandran, Suvankar Pal, Rowling Speech Consortium:
Anne Rowling Neurological Speech Corpus: clinically annotated longitudinal dataset for developing speech biomarkers in neurodegenerative disorders.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SungHYS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SungHYS25
Selina S. Sung, Seunghee Ha, Tae-Jin Yoon, Jungmin So:
Multitask Learning with Fused Attention for Improved ASR and Mispronunciation Detection in Children's Speech Sound Disorders.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NeumannKIKNLR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NeumannKIKNLR25
Michael Neumann, Hardik Kothare, Beverly Insel, Anzalee Khan, Danyah Nadim, Jean-Pierre Lindenmayer, Vikram Ramanarayanan:
Multimodal Speech, Language and Orofacial Analysis for Remote Assessment of Positive, Negative and Cognitive Symptoms in Schizophrenia.

ASR Assessment and Foundational Models

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Phukon0H25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Phukon0H25
Bornali Phukon, Xiuwen Zheng, Mark Hasegawa-Johnson:
Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HouLW0WGW025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HouLW0WGW025
Yixuan Hou, Heyang Liu, Yuhao Wang, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang:
SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PulikodanKGSD25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PulikodanKGSD25
Sujith Pulikodan, Sahapthan K, Prasanta Kumar Ghosh, Visruth Sanka, Nihar Desai:
An approach to measuring the performance of Automatic Speech Recognition(ASR) models in the context of Large Language Model(LLM) powered applications.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangGWGC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangGWGC25
Heng-Jui Chang, Hongyu Gong, Changhan Wang, James R. Glass, Yu-An Chung:
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandoMT25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandoMT25
Shunsuke Kando, Yusuke Miyao, Shinnosuke Takamichi:
Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoffnerWBM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoffnerWBM25
Dirk Eike Hoffner, Simon Weihe, Thomas Brand, Bernd T. Meyer:
Hearing deficits of transformer-based ASR for anechoic and spatial signals.

Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JonesWCSNDS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JonesWCSNDS25
Karen Jones, Kevin Walker, Christopher Caruso, Elliot Singer, Trang Nguyen, Robert B. Dunn, Stephanie M. Strassel:
TELVID: A Multilingual Multi-modal Corpus for Speaker Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GolebiowskaS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GolebiowskaS25
Magdalena Golebiowska, Piotr Syga:
EmoSpeechAuth: Emotion-Aware Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GreenbergDTSNDM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GreenbergDTSNDM25
Craig S. Greenberg, Lukas L. Diduch, Audrey Tong, Elliot Singer, Trang Nguyen, Robert Dunn, Lisa P. Mason, Beth Matys:
The 2024 NIST Speaker Recognition Evaluation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhongNM25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhongNM25
Wenjie Zhong, Jason Naradowsky, Yusuke Miyao:
A Simple-Yet-Effective Data Augmentation Method for Speaker Identification in Novels.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gan0JHML25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gan0JHML25
Chong-Xin Gan, Zhe Li, Zezhong Jin, Zilong Huang, Man-Wai Mak, Kong Aik Lee:
IDIR: Identifying and Distilling Informative Relations for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BarahonaSMPPR0H25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BarahonaSMPPR0H25
Sara Barahona, Anna Silnova, Ladislav Mosner, Junyi Peng, Oldrich Plchot, Johan Rohdin, Lin Zhang, Jiangyu Han, Petr Pálka, Federico Landini, Lukás Burget, Themos Stafylakis, Sandro Cumani, Dominik Bobos, Miroslav Hlavácek, Martin Kodovsky, Tomás Pavlícek:
Analysis of ABC Frontend Audio Systems for the NIST-SRE24.

Speech Analysis, Detection and Classification 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LjubesicPR25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LjubesicPR25
Nikola Ljubesic, Ivan Porupski, Peter Rupnik:
Identifying Primary Stress Across Related Languages and Dialects with Transformer-based Speech Encoder Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MallelaSRBY25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MallelaSRBY25
Jhansi Mallela, Upendra Vishwanath Y. S., Sankara Bharadwaj Rangavajjala, Bhaskar Bhatt, Chiranjeevi Yarra:
SupraDoRAL: Automatic Word Prominence Detection Using Suprasegmental Dependencies of Representations with Acoustic and Linguistic Context.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JacquelinGGVP25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JacquelinGGVP25
Maxime Jacquelin, Maëva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin:
LombardTokenizer: Disentanglement and Control of Vocal Effort in a Neural Speech Codec.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinCLXW25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinCLXW25
Yuke Lin, Jun Chen, Wenjie Li, Longshuai Xiao, Chao Weng:
Robust Personal Voice Activity Detection for Mitigating Domain Mismatch and False Acceptance Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChiPCRABGA25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChiPCRABGA25
Hyung-Gun Chi, Florian Pesce, Wonil Chang, Oggi Rudovic, Arturo Argueta, Stefan Braun, Vineet Garg, Ahmed Hussen Abdelaziz:
Adaptive Knowledge Distillation for Device-Directed Speech Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuWHHC25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuWHHC25
En-Lun Yu, Chien-Chun Wang, Jeih-Weih Hung, Shih-Chieh Huang, Berlin Chen:
Flexible VAD-PVAD Transition: A Detachable PVAD Module for Dynamic Encoder RNN VAD.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Maciejewski25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Maciejewski25
Matthew Maciejewski:
Speaker Conditioning of Voice Activity Detection via Implicit Separation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWGW025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWGW025
Junyu Wang, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang:
ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaleRAJSSCGBWBG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaleRAJSSCGBWBG25
Prabash Reddy Male, Swayambhu Nath Ray, Harish Arsikere, Akshat Jaiswal, Prakhar Swarup, Prantik Sen, Debmalya Chakrabarty, K. V. Vijay Girish, Nikhil Bhave, Frederick Weber, Sambuddha Bhattacharya, Sri Garimella:
DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation.

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.