default search action

combined dblp search
author search
venue search
publication search

ask others

25th Interspeech 2024: Kos, Greece

> Home > Conferences and Workshops > Interspeech

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/2024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/2024
Itshak Lapidot, Sharon Gannot:
25th Annual Conference of the International Speech Communication Association, Interspeech 2024, Kos, Greece, September 1-5, 2024. ISCA 2024

Keynote 1 ISCA Medallist

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Trancoso24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Trancoso24
Isabel Trancoso:
Towards Responsible Speech Processing.

L2 Speech, Bilingualism and Code-Switching

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WesolekGBZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WesolekGBZ24
Sarah Wesolek, Piotr Gulgowski, Joanna Blaszczak, Marzena Zygis:
The influence of L2 accent strength and different error types on personality trait ratings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChiW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChiW024
Jie Chi, Electra Wallington, Peter Bell:
Characterizing code-switching: Applying Linguistic Principles for Metric Assessment and Development.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XueYM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XueYM24
Wei Xue, Ivan Yuen, Bernd Möbius:
Towards a better understanding of receptive multilingualism: listening conditions and priming effects.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MohapatraZF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MohapatraZF24
Debasish Ray Mohapatra, Victor Zappi, Sidney Fels:
2.5D Vocal Tract Modeling: Bridging Low-Dimensional Efficiency with 3D Accuracy.

Speaker Diarization 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChowdhuryMFW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChowdhuryMFW24
Anurag Chowdhury, Abhinav Misra, Mark C. Fuhs, Monika Woszczyna:
Investigating Confidence Estimation Measures for Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGPS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGPS24
Xiang Li, Vivek Govindan, Rohit Paturi, Sundararajan Srinivasan:
Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangWZLXW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangWZLXW24
Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang:
On the Success and Limitations of Auxiliary Network Based Word-Level End-to-End Neural Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HarkonenBS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HarkonenBS24
Marc Härkönen, Samuel J. Broughton, Lahiru Samarakoon:
EEND-M2F: Masked-attention mask transformers for speaker diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Yin0SZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Yin0SZ24
Yongkang Yin, Xu Li, Ying Shan, Yuexian Zou:
AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-step Cross-attention for Robust Speaker Diarization in the Wild.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AryaAN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AryaAN24
Arunav Arya, Murtiza Ali, Karan Nathwani:
Exploiting Wavelet Scattering Transform for an Unsupervised Speaker Diarization in Deep Neural Network Framework.

Speech and Audio Analysis and Representations

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoXYZ0M24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoXYZ0M24
Hang Zhao, Yifei Xin, Zhesong Yu, Bilei Zhu, Lu Lu, Zejun Ma:
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiizumiTOHYTI24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiizumiTOHYTI24
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Masahiro Yasuda, Shunsuke Tsubaki, Keisuke Imoto:
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FujitaK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FujitaK24
Yusuke Fujita, Tatsuya Komatsu:
Audio Fingerprinting with Holographic Reduced Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeyerAFS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeyerAFS24
David Meyer, Eitan Abecassis, Clara Fernandez-Labrador, Christopher Schroers:
RAST: A Reference-Audio Synchronization Tool for Dubbed Content.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiHHHZW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiHHHZW24
Xuefei Li, Hao Huang, Ying Hu, Liang He, Jiabao Zhang, Yuyi Wang:
YOLOPitch: A Time-Frequency Dual-Branch YOLO Model for Pitch Estimation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UllahRH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UllahRH24
Asad Ullah, Alessandro Ragano, Andrew Hines:
Reduce, Reuse, Recycle: Is Perturbed Data Better than Other Language Augmentation for Low Resource Self-Supervised Speech Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PieperV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PieperV24
Jaden Pieper, Stephen Voran:
AlignNet: Learning dataset score alignment functions to enable better training of speech quality estimators.

Acoustic Event Detection and Classification 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangZHT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangZHT24
Dawei Liang, Alice Zhang, David Harwath, Edison Thomaz:
Improving Audio Classification with Low-Sampled Microphone Input: An Empirical Study Using Model Self-Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MuZY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MuZY24
Da Mu, Zhicheng Zhang, Haobo Yue:
MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NamKMLP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NamKMLP24
Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park:
Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoDK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoDK24
Tuan Vu Ho, Kota Dohi, Yohei Kawaguchi:
Stream-based Active Learning for Anomalous Sound Detection in Machine Condition Monitoring.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangHLDZ0Q0F24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangHLDZ0Q0F24
Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan:
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieLXLYW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieLXLYW24
Zeyu Xie, Baihan Li, Xuenan Xu, Zheng Liang, Kai Yu, Mengyue Wu:
FakeSound: Deepfake General Audio Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhaffarzadeganB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhaffarzadeganB24
Shabnam Ghaffarzadegan, Luca Bondi, Wei-Cheng Lin, Abinaya Kumar, Ho-Hsiang Wu, Hans-Georg Horst, Samarjit Das:
Sound of Traffic: A Dataset for Acoustic Traffic Identification and Counting.

Detection and Classification of Bioacoustic Signals

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kumar0Z24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kumar0Z24
Sahil Kumar, Jialu Li, Youshan Zhang:
Vision Transformer Segmentation for Visual Bird Sound Denoising.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JingZX0BS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JingZX0BS24
Xin Jing, Luyang Zhang, Jiangjian Xie, Alexander Gebhard, Alice Baird, Björn W. Schuller:
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CauzinilleFMCAR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CauzinilleFMCAR24
Jules Cauzinille, Benoît Favre, Ricard Marxer, Dena J. Clink, Abdul Hamid Ahmad, Arnaud Rey:
Investigating self-supervised speech models' ability to classify animal vocalizations: The case of gibbon's vocal signatures.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QiuZSCZQZHYS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QiuZSCZQZHYS24
Xihang Qiu, Lixian Zhu, Zikai Song, Zeyu Chen, Haojie Zhang, Kun Qian, Ye Zhang, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller:
Study Selectively: An Adaptive Knowledge Distillation based on a Voting Network for Heart Sound Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinY0LY0TC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinY0LY0TC24
Jie Lin, Xiuping Yang, Li Xiao, Xinhong Li, Weiyan Yi, Yuhong Yang, Weiping Tu, Xiong Chen:
SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness.

Acoustic Echo Cancellation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NayakSS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NayakSS24
Premanand Nayak, Kamini Sabu, M. Ali Basha Shaik:
Multi-mic Echo Cancellation Coalesced with Beamforming for Real World Adverse Acoustic Conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KhanaghaKKS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KhanaghaKKS24
Vahid Khanagha, Dimitris Koutsaidis, Kaustubh Kalgaonkar, Sriram Srinivasan:
Interference Aware Training Target for DNN based joint Acoustic Echo Cancellation and Noise Suppression.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoS24
Yi Gao, Xiang Su:
Low Complexity Echo Delay Estimator Based on Binarized Feature Matching.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiPHZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiPHZ24
Ye Ni, Cong Pang, Chengwei Huang, Cairong Zou:
MSA-DPCRN: A Multi-Scale Asymmetric Dual-Path Convolution Recurrent Network with Attentional Feature Fusion for Acoustic Echo Cancellation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchwartzG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchwartzG24
Ofer Schwartz, Sharon Gannot:
Efficient Joint Bemforming and Acoustic Echo Cancellation Structure for Conference Call Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoLZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoLZ24
Fei Zhao, Jinjiang Liu, Xueliang Zhang:
SDAEC: Signal Decoupling for Advancing Acoustic Echo Cancellation.

Speech Synthesis: Voice Conversion 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SekiTTSIS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SekiTTSIS24
Kentaro Seki, Shinnosuke Takamichi, Norihiro Takamune, Yuki Saito, Kanami Imamura, Hiroshi Saruwatari:
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaadePH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaadePH24
Alan Baade, Puyuan Peng, David Harwath:
Neural Codec Language Models for Disentangled and Textless Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MorrisonCPP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MorrisonCPP24
Max Morrison, Cameron Churchwell, Nathan Pruyne, Bryan Pardo:
Fine-Grained and Interpretable Neural Speech Editing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanekoKTK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanekoKTK24
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo:
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NingW0WYXB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NingW0WYXB24
Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi:
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QiWLZZZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QiWLZZZ24
Tianhua Qi, Shiyan Wang, Cheng Lu, Yan Zhao, Yuan Zong, Wenming Zheng:
Towards Realistic Emotional Voice Conversion using Controllable Emotional Intensity.

Neural Network Architectures for ASR 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NakagomeH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NakagomeH24
Yu Nakagome, Michael Hentschel:
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengLHWXQDLJZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengLHWXQDLJZ24
Qinglin Meng, Min Liu, Kaixun Huang, Kun Wei, Lei Xie, Zongfeng Quan, Weihong Deng, Quan Lu, Ning Jiang, Guoqing Zhao:
SEQ-former: A context-enhanced and efficient automatic speech recognition framework.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FlynnR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FlynnR24
Robert Flynn, Anton Ragni:
How Much Context Does My Attention-Based ASR System Need?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VitaleSC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VitaleSC24
Vincenzo Norman Vitale, Loredana Schettino, Francesco Cutugno:
Rich speech signal: exploring and exploiting end-to-end automatic speech recognizers' ability to model hesitation phenomena.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang00Y24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang00Y24
Tian-Hao Zhang, Xinyuan Qian, Feng Chen, Xu-Cheng Yin:
Transmitted and Aggregated Self-Attention for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PrabhuPJ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrabhuPJ024
Darshan Prabhu, Yifan Peng, Preethi Jyothi, Shinji Watanabe:
MULTI-CONVFORMER: Extending Conformer with Multiple Convolution Kernels.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiyazakiMM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiyazakiMM24
Koichi Miyazaki, Yoshiki Masuyama, Masato Murata:
Exploring the Capability of Mamba in Speech Applications.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WanWMCY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WanWMCY24
Genshun Wan, Mengzhi Wang, Tingzhi Mao, Hang Chen, Zhongfu Ye:
Lightweight Transducer Based on Frame-Level Criterion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001SK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001SK24
Ankit Gupta, George Saon, Brian Kingsbury:
Exploring the limits of decoder-only models trained on public speech recognition corpora.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0005LWQ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0005LWQ24
Xun Gong, Anqi Lv, Zhiming Wang, Yanmin Qian:
Contextual Biasing Speech Recognition in Speech-enhanced Large Language Model.

Decoding Algorithms

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangXLHJDCHGLML24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangXLHJDCHGLML24
Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jin, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu:
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZouTZMWZLHW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZouTZMWZLHW024
Kun Zou, Fengyun Tan, Ziyang Zhuang, Chenfeng Miao, Tao Wei, Shaodan Zhai, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao:
E-Paraformer: A Faster and Better Parallel Transformer for Non-autoregressive End-to-End Mandarin Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CiaperoniKGK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CiaperoniKGK24
Martino Ciaperoni, Athanasios Katsamanis, Aristides Gionis, Panagiotis Karras:
Beam-search SIEVE for low-memory speech recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GalvezBXK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GalvezBXK24
Daniel Galvez, Vladimir Bataev, Hainan Xu, Tim Kaldewey:
Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWCMSRPSPMZS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWCMSRPSPMZS24
Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Yanzhang He, Pedro Moreno Mengibar:
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakagiWOK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakagiWOK24
Tatsunari Takagi, Yukoh Wakabayashi, Atsunori Ogawa, Norihide Kitaoka:
Text-only Domain Adaptation for CTC-based Speech Recognition through Substitution of Implicit Linguistic Information in the Search Space.

Pronunciation Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangSW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangSW24
Xintong Wang, Mingqian Shi, Ye Wang:
Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenYH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenYH24
Yu-Wen Chen, Zhou Yu, Julia Hirschberg:
MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open Response Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaoFSS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaoFSS24
Xinwei Cao, Zijian Fan, Torbjørn Svendsen, Giampiero Salvi:
A Framework for Phoneme-Level Pronunciation Assessment Using CTC.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShahinA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShahinA24
Mostafa Shahin, Beena Ahmed:
Phonological-Level Mispronunciation Detection and Diagnosis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoLL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoLL24
Heejin Do, Wonjun Lee, Gary Geunbae Lee:
Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhanZKVGHK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhanZKVGHK24
Nhan Phan, Anna von Zansen, Maria Kautonen, Ekaterina Voskoboinik, Tamás Grósz, Raili Hildén, Mikko Kurimo:
Automated content assessment and feedback for Finnish L2 learners in a picture description speaking task.

Spoken Language Processing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangKWZHJSLY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangKWZHJSLY24
Zhenyu Wang, Shuyu Kong, Li Wan, Biqiao Zhang, Yiteng Huang, Mumin Jin, Ming Sun, Xin Lei, Zhaojun Yang:
Query-by-Example Keyword Spotting Using Spectral-Temporal Graph Attentive Pooling and Multi-Task Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungLYRHC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungLYRHC24
Youngmoon Jung, Seungjin Lee, Joon-Young Yang, Jaeyoung Roh, Chang Woo Han, Hoonyoung Cho:
Relational Proxy Loss for Audio-Text based Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JinJLRHC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinJLRHC24
Sichen Jin, Youngmoon Jung, Seungjin Lee, Jaeyoung Roh, Changwoo Han, Hoonyoung Cho:
CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiYXY0LY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiYXY0LY24
Haoyu Li, Baochen Yang, Yu Xi, Linfeng Yu, Tian Tan, Hao Li, Kai Yu:
Text-aware Speech Separation for Multi-talker Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YenKS024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YenKS024
Hao Yen, Pin-Jui Ku, Sabato Marco Siniscalchi, Chin-Hui Lee:
Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Monteiro24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Monteiro24
Raul Monteiro:
Adding User Feedback To Enhance CB-Whisper.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengTCAYS0CSCJ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengTCAYS0CSCJ024
Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe:
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.

Spoken Machine Translation 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWB24
Nan Chen, Yonghe Wang, Feilong Bao:
Parameter-Efficient Adapter Based on Pre-trained Models for Speech Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbdullahSK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbdullahSK24
Badr M. Abdullah, Mohammed Maqsood Shaik, Dietrich Klakow:
Wave to Interlingua: Analyzing Representations of Multilingual Speech Transformers for Spoken Language Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWB24a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWB24a
Nan Chen, Yonghe Wang, Feilong Bao:
Knowledge-Preserving Pluggable Modules for Multilingual Speech Translation Tasks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RabatinSC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RabatinSC24
Rastislav Rabatin, Frank Seide, Ernie Chang:
Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangX0CS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangX0CS24
Peidong Wang, Jian Xue, Jinyu Li, Jun-Kun Chen, Aswin Shanmugam Subramanian:
Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OneataK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OneataK24
Dan Oneata, Herman Kamper:
Translating speech with just images.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KhuranaHLWR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KhuranaHLWR24
Sameer Khurana, Chiori Hori, Antoine Laurent, Gordon Wichern, Jonathan Le Roux:
ZeroST: Zero-Shot Speech Translation.

Biosignal-enabled Spoken Communication

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiL24
Jinyu Li, Leonardo Lancia:
A multimodal approach to study the nature of coordinative patterns underlying speech rhythm.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuKNZGKMYBMA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuKNZGKMYBMA24
Peter Wu, Ryan Kaveh, Raghav Nautiyal, Christine Zhang, Albert Guo, Anvitha Kachinthaya, Tavish Mishra, Bohan Yu, Alan W. Black, Rikky Muller, Gopala Krishna Anumanchipalli:
Towards EMG-to-Speech with Necklace Form Factor.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BrasPS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BrasPS24
Chris Bras, Tanvina Patel, Odette Scharenborg:
Using articulated speech EEG signals for imagined speech decoding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KwonHDF024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KwonHDF024
Jinuk Kwon, David Harwath, Debadatta Dash, Paul Ferrari, Jun Wang:
Direct Speech Synthesis from Non-Invasive, Neuromagnetic Signals.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangSRNZYW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangSRNZYW24
Yudong Yang, Rongfeng Su, Rukiye Ruzi, Manwa L. Ng, Shaofeng Zhao, Nan Yan, Lan Wang:
Optical Flow Guided Tongue Trajectory Generation for Diffusion-based Acoustic to Articulatory Inversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JainYWPA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JainYWPA24
Rishi Jain, Bohan Yu, Peter Wu, Tejas S. Prabhune, Gopala Anumanchipalli:
Multimodal Segmentation for Vocal Tract Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BandekarUG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BandekarUG24
Jesuraj Bandekar, Sathvik Udupa, Prasanta Kumar Ghosh:
Articulatory synthesis using representations learnt through phonetic label-aware contrastive loss.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YanXZTGW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YanXZTGW024
Yujie Yan, Xiran Xu, Haolin Zhu, Pei Tian, Zhongshu Ge, Xihong Wu, Jing Chen:
Auditory Attention Decoding in Four-Talker Environment with EEG.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinHC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinHC024
Zijie Lin, Tianyu He, Siqi Cai, Haizhou Li:
ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PahujaIHCS024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PahujaIHCS024
Saurav Pahuja, Gabriel Ivucic, Pascal Himmelmann, Siqi Cai, Tanja Schultz, Haizhou Li:
Leveraging Graphic and Convolutional Neural Networks for Auditory Attention Detection with EEG.

Individual and Social Factors in Phonetics

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PistorL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PistorL24
Tillmann Pistor, Adrian Leemann:
Echoes of Implicit Bias Exploring Aesthetics and Social Meanings of Swiss German Dialect Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li24
Vivian Guo Li:
In search of structure and correspondence in intra-speaker trial-to-trial variability.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SmithSC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SmithSC24
Irene Smith, Morgan Sonderegger, Spade Consortium:
Modelled Multivariate Overlap: A method for measuring vowel merger.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OchiILK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OchiILK24
Keiko Ochi, Koji Inoue, Divesh Lala, Tatsuya Kawahara:
Entrainment Analysis and Prosody Prediction of Subsequent Interlocutor's Backchannels in Dialogue.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TannerSSKMDT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TannerSSKMDT24
James Tanner, Morgan Sonderegger, Jane Stuart-Smith, Tyler Kendall, Jeff Mielke, Robin Dodsworth, Erik Thomas:
Exploring the anatomy of articulation rate in spontaneous English speech: relationships between utterance length effects and social factors.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaylorGD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaylorGD24
Katelyn Taylor, Amelia Jane Gully, Helena Daffern:
Familiar and Unfamiliar Speaker Identification in Speech and Singing.

Paralinguistics

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Parra-GallegoPV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Parra-GallegoPV24
Luis Felipe Parra-Gallego, Tilak Purohit, Bogdan Vlasenko, Juan Rafael Orozco-Arroyave, Mathew Magimai-Doss:
Cross-transfer Knowledge between Speech and Text Encoders to Evaluate Customer Satisfaction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KodaliKA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KodaliKA24
Manila Kodali, Sudarsana Reddy Kadiri, Paavo Alku:
Fine-tuning of Pre-trained Models for Classification of Vocal Intensity Category from Speech Signals.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KathanBTMHMSMSA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KathanBTMHMSMSA24
Alexander Kathan, Martin Bürger, Andreas Triantafyllopoulos, Sabrina Milkus, Jonas Hohmann, Pauline Muderlak, Jürgen Schottdorf, Richard Musil, Björn W. Schuller, Shahin Amiriparian:
Real-world PTSD Recognition: A Cross-corpus and Cross-linguistic Evaluation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhattacharyaLCH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhattacharyaLCH24
Debasmita Bhattacharya, Eleanor Lin, Run Chen, Julia Hirschberg:
Switching Tongues, Sharing Hearts: Identifying the Relationship between Empathy and Code-switching in Speech.

Speaker Recognition: Adversarial and Spoofing Attacks

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoselloGLPM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoselloGLPM24
Eros Rosello, Angel M. Gomez, Iván López-Espejo, Antonio M. Peinado, Juan M. Martín-Doñas:
Anti-spoofing Ensembling Model: Dynamic Weight Allocation in Ensemble Models for Improved Voice Biometrics Security.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWCDLEY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWCDLEY24
Lin Zhang, Xin Wang, Erica Cooper, Mireia Díez, Federico Landini, Nicholas W. D. Evans, Junichi Yamagishi:
Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuGZZPZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuGZZPZ24
Haochen Wu, Wu Guo, Zhentao Zhang, Wenting Zhao, Shengyu Peng, Jie Zhang:
Spoofing Speech Detection by Modeling Local Spectro-Temporal and Long-term Dependency.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuZLSWZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuZLSWZ24
Jingze Lu, Yuxiang Zhang, Zhuo Li, Zengqiang Shang, Wenchao Wang, Pengyuan Zhang:
Improving Copy-Synthesis Anti-Spoofing Training Method with Rhythm and Speaker Perturbation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanXLS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanXLS24
Yip Keng Kan, Ke Xu, Hao Li, Jie Shi:
VoiceDefense: Protecting Automatic Speaker Verification Models Against Black-box Adversarial Attacks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenDWJL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenDWJL24
Xuanjun Chen, Jiawei Du, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee:
Neural Codec-based Adversarial Sample Detection for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenBYZL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenBYZL24
Sizhou Chen, Yibo Bai, Jiadi Yao, Xiao-Lei Zhang, Xuelong Li:
Textual-Driven Adversarial Purification for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZGW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZGW24
Zhuhai Li, Jie Zhang, Wu Guo, Haochen Wu:
Boosting the Transferability of Adversarial Examples with Gradient-Aligned Ensemble Attack for Speaker Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TruongTNLLC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TruongTNLLC24
Duc-Tuan Truong, Ruijie Tao, Tuan Nguyen, Hieu-Thi Luong, Kong Aik Lee, Eng Siong Chng:
Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection.

Audio Event Detection and Classification 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengDN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengDN24
Tiantian Feng, Dimitrios Dimitriadis, Shrikanth S. Narayanan:
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DinkelYWZW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DinkelYWZW024
Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang, Bin Wang:
Scaling up masked audio encoder learning for general audio classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YadavT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YadavT24
Sarthak Yadav, Zheng-Hua Tan:
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Cai0LS024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Cai0LS024
Pengfei Cai, Yan Song, Kang Li, Haoyu Song, Ian McLoughlin:
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EbbersGWR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EbbersGWR24
Janek Ebbers, François G. Germain, Gordon Wichern, Jonathan Le Roux:
Sound Event Bounding Boxes.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiTCLSH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiTCLSH24
Yanxiong Li, Jiaxin Tan, Guoqing Chen, Jialong Li, Yongjie Si, Qianhua He:
Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network.

Source Separation 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaherianK0WXW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaherianK0WXW24
Hassan Taherian, Vahid Ahmadi Kalkhorani, Ashutosh Pandey, Daniel Wong, Buye Xu, DeLiang Wang:
Towards Explainable Monaural Speaker Separation with Auditory-based Training.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EwertB0S24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EwertB0S24
Iva Ewert, Marvin Borsdorf, Haizhou Li, Tanja Schultz:
Does the Lombard Effect Matter in Speech Separation? Introducing the Lombard-GRID-2mix Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanWGSR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanWGSR24
Zexu Pan, Gordon Wichern, François G. Germain, Kohei Saijo, Jonathan Le Roux:
PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangYY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangYY24
Yiru Zhang, Linyu Yao, Qun Yang:
OR-TSE: An Overlap-Robust Speaker Encoder for Target Speech Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HsiehCK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HsiehCK24
Tsun-An Hsieh, Heeyoul Choi, Minje Kim:
Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinGAD024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinGAD024
Jingru Lin, Meng Ge, Junyi Ao, Liqun Deng, Haizhou Li:
SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangW24
Yiwen Wang, Xihong Wu:
TSE-PI: Target Sound Extraction under Reverberant Environments with Pitch Information.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaijoWGPR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaijoWGPR24
Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux:
Enhanced Reverberation as Supervision for Unsupervised Speech Separation.

Noise Reduction, Dereverberation, and Echo Cancellation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoZHLZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoZHLZ24
Fei Zhao, Chenggang Zhang, Shulin He, Jinjiang Liu, Xueliang Zhang:
Deep Echo Path Modeling for Acoustic Echo Cancellation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoC0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoC0024
Hongmei Guo, Yijiang Chen, Xiaolei Zhang, Xuelong Li:
Graph Attention Based Multi-Channel U-Net for Speech Dereverberation With Ad-Hoc Microphone Arrays.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Bahrman0RR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Bahrman0RR24
Louis Bahrman, Mathieu Fontaine, Jonathan Le Roux, Gaël Richard:
Speech dereverberation constrained on room impulse response characteristics.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuanHKR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuanHKR24
Kuang Yuan, Shuo Han, Swarun Kumar, Bhiksha Raj:
DeWinder: Single-Channel Wind Noise Reduction using Ultrasound Sensing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BarnhillNMB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BarnhillNMB24
Alexander Barnhill, Elmar Nöth, Andreas K. Maier, Christian Bergler:
ANIMAL-CLEAN - A Deep Denoising Toolkit for Animal-Independent Signal Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NayakS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NayakS24
Premanand Nayak, M. Ali Basha Shaik:
Elucidating Clock-drift Using Real-world Audios In Wireless Mode For Time-offset Insensitive End-to-End Asynchronous Acoustic Echo Cancellation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangGL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangGL24
Shilin Wang, Haixin Guan, Yanhua Long:
QMixCAT: Unsupervised Speech Enhancement Using Quality-guided Signal Mixing and Competitive Alternating Model Training.

Computationally-Efficient Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaeASBLSC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaeASBLSC24
Hanbin Bae, Pavel Andreev, Azat Saginbaev, Nicholas Babaev, Won-Jun Lee, Hosang Sung, Hoon-Young Cho:
Speech Boosting: Low-Latency Live Speech Enhancement for TWS Earbuds.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GholamiES24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GholamiES24
Behnam Gholami, Mostafa El-Khamy, Kee-Bong Song:
Knowledge Distillation for Tiny Speech Enhancement with Latent Feature Augmentation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZZ24
Yuewei Zhang, Huanbin Zou, Jie Zhu:
Sub-PNWR: Speech Enhancement Based on Signal Sub-Band Splitting and Pseudo Noisy Waveform Reconstruction Loss.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoZLL0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoZLL0024
Zugang Zhao, Jinghong Zhang, Yonghui Liu, Jianbing Liu, Kai Niu, Zhiqiang He:
Streamlining Speech Enhancement DNNs: an Automated Pruning Method Based on Dependency Graph with Advanced Regularized Loss Strategies.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZQW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZQW24
Zehua Zhang, Xuyi Zhuang, Yukun Qian, Mingjiang Wang:
Lightweight Dynamic Sparse Transformer for Monaural Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinCW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinCW24
Zizhen Lin, Xiaoting Chen, Junyu Wang:
MUSE: Flexible Voiceprint Receptive Fields and Multi-Path Fusion Enhanced Taylor Transformer for U-Net-based Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengPXDL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengPXDL24
Longbiao Cheng, Ashutosh Pandey, Buye Xu, Tobi Delbruck, Shih-Chii Liu:
Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement.

Zero-shot TTS

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XueDHGL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XueDHGL24
Jinlong Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li:
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0009ETYZTXLZ0K24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0009ETYZTXLZ0K24
Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda:
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FujitaADI24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FujitaADI24
Kenichi Fujita, Takanori Ashihara, Marc Delcroix, Yusuke Ijima:
Lightweight Zero-shot Text-to-Speech with Mixture of Adapters.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PankovPKBUZGESM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PankovPKBUZGESM24
Vikentii Pankov, Valeria Pronina, Alexander Kuzmin, Maksim Borisov, Nikita Usoltsev, Xingshan Zeng, Alexander Golubkov, Nikolai Ermolenko, Aleksandra Shirshova, Yulia Matveeva:
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness.

Noise Robustness, Far-Field, and Multi-Talker ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JinYS00YKGML00P24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinYS00YKGML00P24
Zengrui Jin, Yifan Yang, Mohan Shi, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey:
LibriheavyMix: A 20, 000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XingXZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XingXZ24
Xujiang Xing, Mingxing Xu, Thomas Fang Zheng:
A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001LYW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001LYW024
Ying Shi, Lantian Li, Shi Yin, Dong Wang, Jiqing Han:
Serialized Output Training by Learned Dominance.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengSZW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengSZW24
Xianrui Zheng, Guangzhi Sun, Chao Zhang, Philip C. Woodland:
SOT Triggered Neural Clustering for Speaker Attributed ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BandoN024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BandoN024
Yoshiaki Bando, Tomohiko Nakamura, Shinji Watanabe:
Neural Blind Source Separation and Diarization for Distant Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MasumuraMTIKOSY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MasumuraMTIKOSY24
Ryo Masumura, Naoki Makishima, Tomohiro Tanaka, Mana Ihori, Naotaka Kawata, Shota Orihashi, Kazutoshi Shinoda, Taiga Yamane, Saki Mizuno, Keita Suzuki, Satoshi Suzuki, Nobukatsu Hojo, Takafumi Moriya, Atsushi Ando:
Unified Multi-Talker ASR with and without Target-speaker Enrollment.

Contextual Biasing and Adaptation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShamsianNGHK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShamsianNGHK24
Aviv Shamsian, Aviv Navon, Neta Glazer, Gill Hetz, Joseph Keshet:
Keyword-Guided Adaptation of Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AnhS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AnhS24
Nguyen Manh Tien Anh, Thach Ho Sy:
Improving Speech Recognition with Prompt-based Contextualized ASR and LLM-based Re-predictor.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang0L0Z024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang0L0Z024
Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen:
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangZTCXL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangZTCXL24
Chengxu Yang, Lin Zheng, Sanli Tian, Gaofeng Cheng, Sujie Xiao, Ta Li:
Contextual Biasing with Confidence-based Homophone Detector for Mandarin End-to-End Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangYKP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangYKP24
Ruizhe Huang, Mahsa Yarmohammadi, Sanjeev Khudanpur, Daniel Povey:
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AndrusenkoLBLG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AndrusenkoLBLG24
Andrei Andrusenko, Aleksandr Laptev, Vladimir Bataev, Vitaly Lavrukhin, Boris Ginsburg:
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiM24
Xizi Wei, Stephen McGregor:
Prompt Tuning for Speech Recognition on Unknown Spoken Name Entities.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Liu0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Liu0024
Junzhe Liu, Jianwei Yu, Xie Chen:
Improved Factorized Neural Transducer Model For Text-only Domain Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuC24
Pin-Yen Liu, Jen-Tzung Chien:
Modality Translation Learning for Joint Speech-Text Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoSZXZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoSZXZ24
Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng:
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AndoMHM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AndoMHM24
Atsushi Ando, Takafumi Moriya, Shota Horiguchi, Ryo Masumura:
Factor-Conditioned Speaking-Style Captioning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KhassanovCCCLZL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KhassanovCCCLZL24
Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen, Tze Yuang Chong, Wei Li, Jun Zhang, Lu Lu, Yuxuan Wang:
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YusufBRR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YusufBRR24
Bolaji Yusuf, Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran:
Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimPYK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimPYK24
Junghun Kim, Ka Hyun Park, Hoyoung Yoon, U Kang:
Domain-Aware Data Selection for Speech Classification via Meta-Reweighting.

Spoken Language Understanding

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FutamiAKT024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FutamiAKT024
Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PorjazovskiMK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PorjazovskiMK24
Dejan Porjazovski, Anssi Moisio, Mikko Kurimo:
Out-of-distribution generalisation in spoken language understanding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LaperriereGJE24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LaperriereGJE24
Gaëlle Laperrière, Sahar Ghannay, Bassam Jabaian, Yannick Estève:
A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeCGNB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeCGNB24
Beomseok Lee, Ioan Calapodescu, Marco Gaido, Matteo Negri, Laurent Besacier:
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiYZRZZTSY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiYZRZZTSY24
Yuang Li, Jiawei Yu, Min Zhang, Mengxin Ren, Yanqing Zhao, Xiaofeng Zhao, Shimin Tao, Jinsong Su, Hao Yang:
Using Large Language Model for End-to-End Chinese ASR and NER.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoudounasGPB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoudounasGPB24
Alkis Koudounas, Flavio Giobergia, Eliana Pastor, Elena Baralis:
A Contrastive Learning Approach to Mitigate Bias in Speech Models.

Spoken Machine Translation 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangLGIKMP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangLGIKMP24
Chao-Wei Huang, Hui Lu, Hongyu Gong, Hirofumi Inaguma, Ilia Kulikov, Ruslan Mavlyutov, Sravya Popuri:
Investigating Decoder-only Large Language Models for Speech-to-text Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HirschkindYN0BL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HirschkindYN0BL24
Nameer Hirschkind, Xiao Yu, Mahesh Kumar Nandwana, Joseph Liu, Eloi du Bois, Dao Le, Nicolas Thiebaut, Colin Sinclair, Kyle Spence, Charles Shang, Zoë Abrams, Morgan McGuire:
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWB24b
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWB24b
Nan Chen, Yonghe Wang, Feilong Bao:
Sign Value Constraint Decomposition for Efficient 1-Bit Quantization of Speech Translation Tasks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeKKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeKKC24
Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung:
Lightweight Audio Segmentation for Long-form Speech Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanS24
Haotian Tan, Sakriani Sakti:
Contrastive Feedback Mechanism for Simultaneous Speech Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MacaireDSLE24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MacaireDSLE24
Cécile Macaire, Chloé Dion, Didier Schwab, Benjamin Lecouteux, Emmanuelle Esperança-Rodier:
Towards Speech-to-Pictograms Translation.

Hearing Disorders

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeKC24
Seonwoo Lee, Sunhee Kim, Minhwa Chung:
Automatic Assessment of Speech Production Skills for Children with Cochlear Implants Using Wav2Vec2.0 Acoustic Embeddings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnPPCK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnPPCK24
Youngjin Ahn, Jungwoo Park, Sangha Park, Jonghyun Choi, Kee-Eung Kim:
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuckvaleH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuckvaleH24
Mark A. Huckvale, Gaston Hilkhuysen:
Evaluating a 3-factor listener model for prediction of speech intelligibility to hearing-impaired listeners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FagniartCDHHPH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FagniartCDHHPH24
Sophie Fagniart, Brigitte Charlier, Véronique Delvaux, Bernard Harmegnies, Anne Huberlant, Myriam Piccaluga, Kathy Huet:
Production of fricative consonants in French-speaking children with cochlear implants and typical hearing: acoustic and phonological analyses.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IrinoDI24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IrinoDI24
Toshio Irino, Shintaro Doan, Minami Ishikawa:
Signal processing algorithm effective for sound quality of hearing loss simulators.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Niu0Z0LC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Niu0Z0LC24
Yixiang Niu, Ning Chen, Hongqing Zhu, Zhiying Zhu, Guangqiang Li, Yibo Chen:
Auditory Spatial Attention Detection Based on Feature Disentanglement and Brain Connectivity-Informed Graph Neural Networks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MonaghanSCZEK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MonaghanSCZEK24
Jessica Monaghan, Arun Sebastian, Nicky Chong-White, Vicky Zhang, Vijayalakshmi Easwar, Pádraig Kitterick:
Automatic Detection of Hearing Loss from Children's Speech using wav2vec 2.0 Features.

Speech Disorders 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangawalaR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangawalaR24
Vrushank Changawala, Frank Rudzicz:
Whister: Using Whisper's representations for Stuttering detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiongBLC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiongBLC24
Yan Xiong, Visar Berisha, Julie Liss, Chaitali Chakrabarti:
Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenJ24
Guanlin Chen, Yun Jin:
Cascaded Transfer Learning Strategy for Cross-Domain Alzheimer's Disease Recognition through Spontaneous Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IliasA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IliasA24
Loukas Ilias, Dimitris Askounis:
A Cross-Attention Layer coupled with Multimodal Fusion Methods for Recognizing Depression from Spontaneous Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NgXMLB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NgXMLB24
Si-Ioi Ng, Lingfeng Xu, Kimberly D. Mueller, Julie Liss, Visar Berisha:
Segmental and Suprasegmental Speech Foundation Models for Classifying Cognitive Risk Factors: Evaluating Out-of-the-Box Performance.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PapadimitriouP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PapadimitriouP24
Katerina Papadimitriou, Gerasimos Potamianos:
Multimodal Continuous Fingerspelling Recognition via Visual Alignment Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Arias-VergaraP024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Arias-VergaraP024
Tomás Arias-Vergara, Paula Andrea Pérez-Toro, Xiaofeng Liu, Fangxu Xing, Maureen Stone, Jiachen Zhuo, Jerry L. Prince, Maria Schuster, Elmar Nöth, Jonghye Woo, Andreas K. Maier:
Contrastive Learning Approach for Assessment of Phonological Precision in Patients with Tongue Cancer Using MRI Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZLFHL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZLFHL24
Haojie Zhang, Tao Zhang, Ganjun Liu, Dehui Fu, Xiaohui Hou, Ying Lv:
DysArinVox: DYSphonia & DYSarthria mandARIN speech corpus.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouKLSMBVEMGLA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouKLSMBVEMGLA24
Xuanru Zhou, Anshul Kashyap, Steve Li, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli:
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GosztolyaSBH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GosztolyaSBH24
Gábor Gosztolya, Veronika Svindt, Judit Bóna, Ildikó Hoffmann:
Automatic Longitudinal Investigation of Multiple Sclerosis Subjects.

TAUKADIAL Challenge: Speech-Based Cognitive Assessment in Chinese and English (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuzGHFMLCCL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuzGHFMLCCL24
Saturnino Luz, Sofia de la Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu:
Connected Speech-Based Cognitive Assessment in Chinese and English.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ortiz-PerezG024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ortiz-PerezG024
David Ortiz-Perez, José García Rodríguez, David Tomás:
Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gosztolya024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gosztolya024
Gábor Gosztolya, László Tóth:
Combining Acoustic Feature Sets for Detecting Mild Cognitive Impairment in the Interspeech'24 TAUKADIAL Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuanWLL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuanWLL24
Junwen Duan, Fangyuan Wei, Hong-Dong Li, Jin Liu:
Pre-trained Feature Fusion and Matching for Mild Cognitive Impairment Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Barrera-AltunaL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Barrera-AltunaL24
Benjamin Barrera-Altuna, Daeun Lee, Zaima Zarnaz, Jinyoung Han, Seungbae Kim:
The Interspeech 2024 TAUKADIAL Challenge: Multilingual Mild Cognitive Impairment Detection with Multimodal Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Favaro0DM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Favaro0DM24
Anna Favaro, Tianyu Cao, Najim Dehak, Laureano Moro-Velázquez:
Leveraging Universal Speech Representations for Detecting and Assessing the Severity of Mild Cognitive Impairment Across Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoangPDZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoangPDZ24
Bao Hoang, Yijiang Pang, Hiroko H. Dodge, Jiayu Zhou:
Translingual Language Markers for Cognitive Assessment from Spontaneous Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Perez-ToroAKWSN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Perez-ToroAKWSN24
Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Philipp Klumpp, Tobias Weise, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas K. Maier:
Multilingual Speech and Language Analysis for the Assessment of Mild Cognitive Impairment: Outcomes from the Taukadial Challenge.

Show and Tell 1

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/AraiSETO24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AraiSETO24
Takayuki Arai, Ryohei Suzuki, Chandler Earp, Shinya Tsuji, Keiko Ochi:
Production of phrases by mechanical models of the human vocal tract.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/GouravTM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GouravTM24
Vishal Gourav, Ankit Tyagi, Phanindra Mankale:
Faster Vocoder: a multi threading approach to achieve low latency during TTS Inference.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/MohanCEKPLZGG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MohanCEKPLZGG24
Aanchan Mohan, Monideep Chakraborti, Katelyn Eng, Nailia Kushaeva, Mirjana Prpa, Jordan Lewis, Tianyi Zhang, Vince Geisler, Carol Geisler:
A powerful and modern AAC composition tool for impaired speakers.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/MikaZCG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MikaZCG24
Grzegorz P. Mika, Konrad Zielinski, Pawel Cyrta, Marek Grzelec:
VoxFlow AI: wearable voice converter for atypical speech.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/CNV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CNV24
Sai Akarsh C, Vamshiraghusimha Narasinga, Anil Kumar Vuppala:
Stress transfer in speech-to-speech machine translation.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/OkamotoOK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OkamotoOK24
Takuma Okamoto, Yamato Ohtani, Hisashi Kawai:
Mobile PresenTra: NICT fast neural text-to-speech system on smartphones with incremental inference of MS-FC-HiFi-GAN for law-latency synthesis.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/LiljaGLAKF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiljaGLAKF24
Alex Peiró Lilja, José Giraldo, Martí Llopart-Font, Carme Armentano-Oller, Baybars Külebi, Mireia Farrús:
Multi-speaker and multi-dialectal Catalan TTS models for video gaming.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/FrancisSG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FrancisSG24
Juliana Francis, Éva Székely, Joakim Gustafson:
ConnecTone: a modular AAC system prototype with contextual generative text prediction and style-adaptive conversational TTS.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/RohmatillahNSCC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RohmatillahNSCC24
Mahdin Rohmatillah, Bryan Gautama Ngo, Willianto Sulaiman, Po-Chuan Chen, Jen-Tzung Chien:
Reliable dialogue system for facilitating student-counselor communication.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/LamerisGS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LamerisGS24
Harm Lameris, Joakim Gustafson, Éva Székely:
CreakVC: a voice conversion tool for modulating creaky voice.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/TsaoHLWLC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsaoHLWLC24
Yu-Sheng Tsao, Yung-Chang Hsu, Jiun-Ting Li, Siang-Hong Weng, Tien-Hong Lo, Berlin Chen:
EZTalking: English assessment platform for teachers and students.

Keynote 2

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Araki24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Araki24
Shoko Araki:
Frontier of Frontend for Conversational Speech Processing.

Phonetics and Phonology of Second Language Acquisition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TuttosiYWWDAL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TuttosiYWWDAL24
Paige Tuttösí, H. Henny Yeung, Yue Wang, Fenqi Wang, Guillaume Denis, Jean-Julien Aucouturier, Angelica Lim:
Mmm whatcha say? Uncovering distal and proximal context effects in first and second-language word perception using psychophysical reverse correlation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PopescuLVD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PopescuLVD24
Anisia Popescu, Lori Lamel, Ioana Vasilescu, Laurence Devillers:
Automatic Speech Recognition with parallel L1 and L2 acoustic phone models to evaluate /l/ allophony in L2 English speech production.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangGGN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangGGN24
Kevin Huang, Jack Goldberg, Louis Goldstein, Shrikanth Narayanan:
Analysis of articulatory setting for L1 and L2 English speakers using MRI data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ColgiuSRR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ColgiuSRR24
Ioana Colgiu, Laura Spinu, Rajiv Rao, Yasaman Rafat:
Bilingual Rhotic Production Patterns: A Generational Comparison of Spanish-English Bilingual Speakers in Canada.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CoulangeKRM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CoulangeKRM24
Sylvain Coulange, Tsuneo Kato, Solange Rossato, Monica Masperi:
Exploring Impact of Pausing and Lexical Stress Patterns on L2 English Comprehensibility in Real Time.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu24
Qi Wu:
Mandarin T3 Production by Chinese and Japanese Native Speakers.

Corpora-based Approaches in Automatic Emotion Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RanjanCK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RanjanCK24
Sumit Ranjan, Rupayan Chakraborty, Sunil Kumar Kopparapu:
Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoteSB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoteSB24
Pravin Mote, Berrak Sisman, Carlos Busso:
Unsupervised Domain Adaptation for Speech Emotion Recognition using K-Nearest Neighbors Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZLLCZZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZLLCZZ24
Jincen Wang, Yan Zhao, Cheng Lu, Hailun Lian, Hongli Chang, Yuan Zong, Wenming Zheng:
Confidence-aware Hypothesis Transfer Networks for Source-Free Cross-Corpus Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Xi00S024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Xi00S024
Yuxuan Xi, Yan Song, Lirong Dai, Haoyu Song, Ian McLoughlin:
An Effective Local Prototypical Mapping Network for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoSCK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoSCK24
Yuan Gao, Hao Shi, Chenhui Chu, Tatsuya Kawahara:
Speech Emotion Recognition with Multi-level Acoustic and Semantic Information Extraction and Interaction.

Analysis of Speakers States and Traits

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiebuhrT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiebuhrT24
Oliver Niebuhr, Nafiseh Taghva:
How rhythm metrics are linked to produced and perceived speaker charisma.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGZNC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGZNC24
Zhu Li, Xiyuan Gao, Yuqing Zhang, Shekhar Nayak, Matt Coler:
A Functional Trade-off between Prosodic and Semantic Cues in Conveying Sarcasm.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MurzakuSR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MurzakuSR24
John Murzaku, Adil Soubki, Owen Rambow:
Multimodal Belief Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenCKLPTSH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenCKLPTSH24
Run Chen, Haozhe Chen, Anushka Kulkarni, Eleanor Lin, Linda Pang, Divya Tadimeti, Jun Shin, Julia Hirschberg:
Detecting Empathy in Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Tao0CL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Tao0CL24
Dehua Tao, Tan Lee, Harold Chui, Sarah Luk:
Learning Representation of Therapist Empathy in Counseling Conversation Using Siamese Hierarchical Attention Network.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kunmei24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kunmei24
Han Kunmei:
Modelling Lexical Characteristics of the Healthy Aging Population: A Corpus-Based Study.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GerczukALSPHS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GerczukALSPHS24
Maurice Gerczuk, Shahin Amiriparian, Justina Lutz, Wolfgang Strube, Irina Papazova, Alkomiet Hasan, Björn W. Schuller:
Exploring Gender-Specific Speech Patterns in Automatic Suicide Risk Assessment.

Spoofing and Deepfake Detection

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KleinCTC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KleinCTC024
Nicholas Klein, Tianxiang Chen, Hemlata Tak, Ricardo Casal, Elie Khoury:
Source Tracing of Audio Deepfake Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004ZDMT024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004ZDMT024
Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li:
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0037KLNY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0037KLNY24
Xin Wang, Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier Noé, Junichi Yamagishi:
Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaserKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaserKC24
Oguzhan Baser, Kaan Kale, Sandeep P. Chinchali:
SecureSpectra: Safeguarding Digital Identity from Deep Fake Threats via Intelligent Signatures.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li024
Menglu Li, Xiao-Ping Zhang:
Interpretable Temporal Class Activation Representation for Audio Spoofing Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GeXGWYS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GeXGWYS24
Zirui Ge, Xinzhou Xu, Haiyan Guo, Tingting Wang, Zhen Yang, Björn W. Schuller:
DGPN: A Dual Graph Prototypical Network for Few-Shot Speech Spoofing Algorithm Recognition.

Audio Captioning, Tagging, and Audio-Text Retrieval

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Sun0P24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Sun0P24
Jianyuan Sun, Wenwu Wang, Mark D. Plumbley:
PFCA-Net: Pyramid Feature Fusion and Cross Content Attention Network for Automated Audio Captioning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLZDWYWW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLZDWYWW24
Jizhong Liu, Gang Li, Junbo Zhang, Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Yujun Wang, Bin Wang:
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XinZCYZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XinZCYZ24
Yifei Xin, Zhihong Zhu, Xuxin Cheng, Xusheng Yang, Yuexian Zou:
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DinkelYWZWW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DinkelYWZWW24
Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang, Bin Wang:
Streaming Audio Transformers for Online Audio Tagging.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaudharySAP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaudharySAP24
Aryan Chaudhary, Arshdeep Singh, Vinayak Abrol, Mark D. Plumbley:
Efficient CNNs with Quaternion Transformations and Pruning for Audio Tagging.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JingTS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JingTS24
Xin Jing, Andreas Triantafyllopoulos, Björn W. Schuller:
ParaCLAP - Towards a general language-audio model for computational paralinguistic tasks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuLW0P24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuLW0P24
Xuenan Xu, Haohe Liu, Mengyue Wu, Wenwu Wang, Mark D. Plumbley:
Efficient Audio Captioning with Encoder-Level Knowledge Distillation.

Generative Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ScheiblerFSK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ScheiblerFSK24
Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu:
Universal Score-based Speech Enhancement with High Content Preservation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangSKJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangSKJ24
Haici Yang, Jiaqi Su, Minje Kim, Zeyu Jin:
Genhancer: High-Fidelity Speech Enhancement via Generative Modeling on Discrete Codec Tokens.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JukicKBG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JukicKBG24
Ante Jukic, Roman Korostik, Jagadeesh Balam, Boris Ginsburg:
Schrödinger Bridge for Generative Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TrachuPC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TrachuPC24
Thanapat Trachu, Chawan Piansaddhayanon, Ekapol Chuangsuwanich:
Thunder : Unified Regression-Diffusion Speech Enhancement with a Single Reverse Step using Brownian Bridge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangTM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangTM24
Yiyuan Yang, Niki Trigoni, Andrew Markham:
Pre-training Feature Guided Diffusion Model for Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimYKCCLYM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimYKCCLYM24
Dail Kim, Da-Hee Yang, Donghyun Kim, Joon-Hyuk Chang, Jeonghwan Choi, Moa Lee, Jaemo Yang, Han-gil Moon:
Guided conditioning with predictive network on score-based diffusion model for speech enhancement.

Speech Synthesis: Evaluation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YinC0W24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YinC0W24
Chun Yin, Tai-Shih Chi, Yu Tsao, Hsin-Min Wang:
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AnandVSRK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AnandVSRK24
Srija Anand, Praveen Srinivasa Varadhan, Ashwin Sankar, Giri Raju, Mitesh M. Khapra:
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EdlundTMW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EdlundTMW24
Jens Edlund, Christina Tånnander, Sébastien Le Maguer, Petra Wagner:
Assessing the impact of contextual framing on subjective TTS quality.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AdigweWK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AdigweWK24
Adaeze Adigwe, Sarenne Wallbridge, Simon King:
What do people hear? Listeners' Perception of Conversational Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZZZSWQ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZZZSWQ24
Hui Wang, Shiwan Zhao, Jiaming Zhou, Xiguang Zheng, Haoqin Sun, Xuechen Wang, Yong Qin:
Uncertainty-Aware Mean Opinion Score Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SagetST24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SagetST24
Félix Saget, Meysam Shamsi, Marie Tahon:
Lifelong Learning MOS Prediction for Synthetic Speech Quality Evaluation.

Multilingual ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuenYC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuenYC24
Kwok Chin Yuen, Jia Qi Yip, Eng Siong Chng:
Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiWCBKTCJLL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiWCBKTCJLL024
Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MartinGFPR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MartinGFPR24
Andrés Piñeiro Martín, Carmen García-Mateo, Laura Docío Fernández, Maria del Carmen Lopez-Perez, Georg Rehm:
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaifCCLKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaifCCLKC24
A. F. M. Saif, Lisha Chen, Xiaodong Cui, Songtao Lu, Brian Kingsbury, Tianyi Chen:
M2ASR: Multilingual Multi-task Automatic Speech Recognition via Multi-objective Optimization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiY0TDW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiY0TDW24
Song Li, Yongbin You, Xuezhi Wang, Zhengkun Tian, Ke Ding, Guanglu Wan:
MSR-86K: An Evolving, Multilingual Corpus with 86, 300 Hours of Transcribed Audio for Speech Recognition Research.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoustonSHVH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoustonSHVH24
Brady Houston, Omid Sadjadi, Zejiang Hou, Srikanth Vishnubhotla, Kyu J. Han:
Improving Multilingual ASR Robustness to Errors in Language Input.

General Topics in ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuhNJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuhNJ24
Jiwon Suh, Injae Na, Woohwan Jung:
Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZ0LQRMWTY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZ0LQRMWTY24
Yuang Li, Min Zhang, Chang Su, Yinglu Li, Xiaosong Qiao, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Shimin Tao, Hao Yang:
A Multitask Training Approach to Enhance Whisper with Open-Vocabulary Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZusagWT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZusagWT24
Mario Zusag, Laurin Wagner, Bernhard Thallinger:
CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MihajlikMKLSM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MihajlikMKLSM24
Péter Mihajlik, Yan Meng, Mate S. Kadar, Julian Linke, Barbara Schuppler, Katalin Mády:
On Disfluency and Non-lexical Sound Labeling for End-to-end Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MujtabaMAYHB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MujtabaMAYHB24
Dena F. Mujtaba, Nihar R. Mahapatra, Megan Arney, J. Scott Yaruss, Caryn Herring, Jia Bin:
Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanLZZQG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanLZZQG24
Hao Tan, Xiaochen Liu, Huan Zhang, Junjian Zhang, Yaguan Qian, Zhaoquan Gu:
DualPure: An Efficient Adversarial Purification Method for Speech Command Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeheckaPSIP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeheckaPSIP24
Jan Lehecka, Josef V. Psutka, Lubos Smídl, Pavel Ircing, Josef Psutka:
A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuenteJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuenteJ24
Antón de la Fuente, Dan Jurafsky:
A layer-wise analysis of Mandarin and English suprasegmentals in SSL speech models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeivaditiMCNV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeivaditiMCNV24
Spyretta Leivaditi, Tatsunari Matsushima, Matt Coler, Shekhar Nayak, Vass Verkhodanova:
Fine-Tuning Strategies for Dutch Dysarthric Speech Recognition: Evaluating the Impact of Healthy, Disease-Specific, and Speaker-Specific Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Hsieh024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Hsieh024
I-Ting Hsieh, Chung-Hsien Wu:
Dysarthric Speech Recognition Using Curriculum Learning and Articulatory Feature Embedding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZZKQ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZZKQ24
Shiyao Wang, Shiwan Zhao, Jiaming Zhou, Aobo Kong, Yong Qin:
Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengWLD024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengWLD024
Naijun Zheng, Xucheng Wan, Kai Liu, Ziqing Du, Huan Zhou:
An efficient text augmentation approach for contextualized Mandarin speech recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiCYCCK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiCYCCK24
Sheng Li, Chen Chen, Kwok Chin Yuen, Chenhui Chu, Eng Siong Chng, Hisashi Kawai:
Investigating ASR Error Correction with Large Language Model and Multilingual 1-best Hypotheses.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/00010MRPN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/00010MRPN24
Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, Arun Narayanan:
Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping.

Spoken Language Understanding

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhungDES24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhungDES24
Emmy Phung, Harsh Deshpande, Ahmad Emami, Kanishk Singh:
AR-NLU: A Framework for Enhancing Natural Language Understanding Model Robustness against ASR Errors.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiKD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiKD24
Mohan Li, Simon Keizer, Rama Doddipatla:
Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TranLNVNPN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TranLNVNPN24
Tuyen Tran, Khanh Le, Ngoc Dang Nguyen, Minh Vu, Huyen Ngo, Woomyoung Park, Thi Thu Trang Nguyen:
VN-SLU: A Vietnamese Spoken Language Understanding Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandoMNT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandoMNT24
Shunsuke Kando, Yusuke Miyao, Jason Naradowsky, Shinnosuke Takamichi:
Textless Dependency Parsing by Labeled Sequence Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuePZGPGBJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuePZGPGBJ24
Yaoyao Yue, Michael Proctor, Luping Zhou, Rijul Gupta, Tharinda Piyadasa, Amelia Gully, Kirrie Ballard, Craig T. Jin:
Towards Speech Classification from Acoustic and Vocal Tract data in Real-time MRI.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JohnsonPSGGE24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JohnsonPSGGE24
Alexander Johnson, Peter Plantinga, Pheobe Sun, Swaroop Gadiyaram, Abenezer Girma, Ahmad Emami:
Efficient SQA from Long Audio Contexts: A Policy-driven Approach.

Speech and Multimodal Resources

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PesanJKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PesanJKC24
Jan Pesán, Vojtech Jurík, Martin Karafiát, Jan Cernocký:
BESST Dataset: A Multimodal Resource for Speech-based Stress Detection and Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TuretzkyTSDZRCS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TuretzkyTSDZRCS24
Arnon Turetzky, Or Tal, Yael Segal, Yehoshua Dissen, Ella Zeldes, Amit Roth, Eyal Cohen, Yosi Shrem, Bronya Roni Chernyak, Olga Seleznova, Joseph Keshet, Yossi Adi:
HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangS024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangS024
Wenbin Wang, Yang Song, Sanjay Jha:
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KongTH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KongTH24
Yuexuan Kong, Viet-Anh Tran, Romain Hennequin:
STraDa: A Singer Traits Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AndererRW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AndererRW24
Katharina Anderer, Andreas Reich, Matthias Wölfel:
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Sung-BinCSHJNO24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Sung-BinCSHJNO24
Kim Sung-Bin, Lee Chae-Yeon, Gihun Son, Oh Hyun-Bin, Janghoon Ju, Suekyeong Nam, Tae-Hyun Oh:
MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VelicheHKPKS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VelicheHKPKS24
Irina-Elena Veliche, Zhuangqun Huang, Vineeth Ayyat Kochaniyan, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer:
Towards measuring fairness in speech recognition: Fair-Speech dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuXFW0WQLLLWS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuXFW0WQLLLWS24
Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi:
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OsmanKN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OsmanKN24
Mohamed Osman, Daniel Z. Kaplan, Tamer Nadeem:
SER Evals: In-domain and Out-of-domain benchmarking for speech emotion recognition.

Pathological Speech Analysis 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GudmundssonGLEN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GudmundssonGLEN24
Vidar Freyr Gudmundsson, Keve Márton Gönczi, Malin Svensson Lundmark, Donna Erickson, Oliver Niebuhr:
The MARRYS helmet: A new device for researching and training "jaw dancing".
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QuatraTSSOS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QuatraTSSOS24
Moreno La Quatra, Maria Francesca Turco, Torbjørn Svendsen, Giampiero Salvi, Juan Rafael Orozco-Arroyave, Sabato Marco Siniscalchi:
Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Triantafyllopoulos24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Triantafyllopoulos24
Andreas Triantafyllopoulos, Anton Batliner, Wolfgang Mayr, Markus Fendler, Florian B. Pokorny, Maurice Gerczuk, Shahin Amiriparian, Thomas M. Berghaus, Björn W. Schuller:
Sustained Vowels for Pre- vs Post-Treatment COPD Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AmiriK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AmiriK24
Mahdi Amiri, Ina Kodrasi:
Adversarial Robustness Analysis in Automatic Pathological Speech Detection Approaches.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimESHYS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimESHYS24
Gahye Kim, Yunjung Eom, Selina S. Sung, Seunghee Ha, Tae-Jin Yoon, Jungmin So:
Automatic Children Speech Sound Disorder Detection with Age and Speaker Bias Mitigation.

Speech and Language in Health: from Remote Monitoring to Medical Conversations - 1 (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElyaderaniGS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElyaderaniGS24
Mojtaba Kadkhodaie Elyaderani, John Glover, Thomas Schaaf:
Reference-Free Estimation of the Quality of Clinical Notes Generated from Doctor-Patient Conversations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunKC24
Jihyun Mun, Sunhee Kim, Minhwa Chung:
Developing an End-to-End Framework for Predicting the Social Communication Severity Scores of Children with Autism Spectrum Disorder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DespotovicENF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DespotovicENF24
Vladimir Despotovic, Abir Elbéji, Petr V. Nazarov, Guy Fagherazzi:
Multimodal Fusion for Vocal Biomarkers Using Vector Cross-Attention.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoriaPFC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoriaPFC24
Stefano Goria, Roseline Polle, Salvatore Fara, Nicholas Cummins:
Revealing Confounding Biases: A Novel Benchmarking Approach for Aggregate-Level Performance Metrics in Health Assessments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RameauGSEBRPJDP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RameauGSEBRPJDP24
Anaïs Rameau, Satrajit Ghosh, Alexandros Sigaras, Olivier Elemento, Jean-Christophe Bélisle-Pipon, Vardit Ravitsky, Maria Powell, Alistair Johnson, David Dorr, Philip R. O. Payne, Micah Boyer, Stephanie Watts, Ruth Bahr, Frank Rudzicz, Jordan Lerner-Ellis, Shaheen Awan, Don Bolser, Yael Bensoussan:
Developing Multi-Disorder Voice Protocols: A team science approach involving clinical expertise, bioethics, standards, and DEI.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DumpalaDNRUO24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DumpalaDNRUO24
Sri Harsha Dumpala, Katerina Dikaios, Abraham Nunes, Frank Rudzicz, Rudolf Uher, Sageev Oore:
Self-Supervised Embeddings for Detecting Individual Symptoms of Depression.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MehtaSGH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MehtaSGH24
Daryush D. Mehta, Jarrad H. Van Stan, Hamzeh Ghasemzadeh, Robert E. Hillman:
Comparing ambulatory voice measures during daily life with brief laboratory assessments in speakers with and without vocal hyperfunction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001SCSPAVVBCF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001SCSPAVVBCF24
Jennifer Williams, Eike Schneiders, Henry Card, Tina Seabrooke, Beatrice Pakenham-Walsh, Tayyaba Azim, Lucy Valls-Reed, Ganesh Vigneswaran, John Robert Bautista, Rohan Chandra, Arya Farahi:
Predicting Acute Pain Levels Implicitly from Vocal Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DemirRWMY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DemirRWMY24
Kubilay Can Demir, Belén Lojo Rodríguez, Tobias Weise, Andreas K. Maier, Seung Hee Yang:
Towards Intelligent Speech Assistants in Operating Rooms: A Multimodal Model for Surgical Workflow Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PremananthSRBKE24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PremananthSRBKE24
Gowtham Premananth, Yashish M. Siriwardena, Philip Resnik, Sonia Bansal, Deanna L. Kelly, Carol Y. Espy-Wilson:
A Multimodal Framework for the Assessment of the Schizophrenia Spectrum.

Speech and Brain

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangFT0DM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangFT0DM24
Yuzhe Wang, Anna Favaro, Thomas Thebaud, Jesús Villalba, Najim Dehak, Laureano Moro-Velázquez:
Exploring the Complementary Nature of Speech and Eye Movements for Profiling Neurological Disorders.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiMLACZL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiMLACZL24
Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Liping Chen, Jie Zhang, Zhenhua Ling:
Refining Self-supervised Learnt Speech Representation using Brain Activations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangGMWM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangGMWM24
Yuejiao Wang, Xianmin Gong, Lingwei Meng, Xixin Wu, Helen Meng:
Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NeelabhS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NeelabhS24
Kumar Neelabh, Vishnu Sreekumar:
From Sound to Meaning in the Auditory Cortex: A Neuronal Representation and Classification Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengL0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengL0024
Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang:
Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeKFASKN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeKFASKN24
Jihwan Lee, Aditya Kommineni, Tiantian Feng, Kleanthis Avramidis, Xuan Shi, Sudarsana Reddy Kadiri, Shrikanth Narayanan:
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals.

Innovative Methods in Phonetics and Phonology

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnCLL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnCLL24
Emily P. Ahn, Eleanor Chodroff, Myriam Lapierre, Gina-Anne Levow:
The Use of Phone Categories and Cross-Language Modeling for Phone Alignment of Panãra.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BarmanM024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BarmanM024
Sneha Ray Barman, Shakuntala Mahanta, Neeraj Kumar Sharma:
Deciphering Assamese Vowel Harmony with Featural InfoWaveGAN.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TadavarthyJR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TadavarthyJR24
Harsha Veena Tadavarthy, Austin Jones, Margaret E. L. Renwick:
Phonological Feature Detection for US English using the Phonet Library.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KalandS024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KalandS024
Constantijn Kaland, Jeremy Steffman, Jennifer Cole:
K-means and hierarchical clustering of f0 contours.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoussoCKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoussoCKC24
Rotem Rousso, Eyal Cohen, Joseph Keshet, Eleanor Chodroff:
Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimG24
Lila Kim, Cédric Gendrot:
Using wav2vec 2.0 for phonetic classification tasks: methodological aspects.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LambropoulosCI24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LambropoulosCI24
Michael Lambropoulos, Frantz Clermont, Shunichi Ishihara:
The sub-band cepstrum as a tool for locating local spectral regions of phonetic sensitivity: A first attempt with multi-speaker vowel data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChungK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChungK24
Woo-Jin Chung, Hong-Goo Kang:
Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiseKDPSNHMY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiseKDPSNHMY24
Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Nöth, Björn Heismann, Andreas K. Maier, Seung Hee Yang:
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OuraKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OuraKK24
Anna Oura, Hideaki Kikuchi, Tetsunori Kobayashi:
Preprocessing for acoustic-to-articulatory inversion using real-time MRI movies of Japanese speech.

Voice, Tones and F0

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiA24
Chenyu Li, Jalal Al-Tamimi:
Impact of the tonal factor on diphthong realizations in Standard Mandarin with Generalized Additive Mixed Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuZ24
Xiaowang Liu, Jinsong Zhang:
A Study on the Information Mechanism of the 3rd Tone Sandhi Rule in Mandarin Disyllabic Words.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Weirich0J24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Weirich0J24
Melanie Weirich, Daniel Duran, Stefanie Jannedy:
Gender and age based f0-variation in the German Plapper Corpus.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuWFHHWKV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuWFHHWKV24
Chenzi Xu, Jessica Wormald, Paul Foulkes, Philip Harrison, Vincent Hughes, Poppy Welch, Finnian Kelly, David van der Vloed:
Voice quality in telephone speech: Comparing acoustic measures between VoIP telephone and high-quality recordings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GessingerAC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GessingerAC24
Iona Gessinger, Bistra Andreeva, Benjamin R. Cowan:
The Use of Modifiers and f0 in Remote Referential Communication with Human and Computer Partners.

Emotion Recognition: Resources and Benchmarks

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaCZZCLY0H24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaCZZCLY0H24
Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain:
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Triantafyllopoulos24a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Triantafyllopoulos24a
Andreas Triantafyllopoulos, Anton Batliner, Simon David Noel Rampp, Manuel Milling, Björn W. Schuller:
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IbrahimSKMA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IbrahimSKMA24
Adham Ibrahim, Shady Shehata, Ajinkya Kulkarni, Mukhtar Mohamed, Muhammad Abdul-Mageed:
What Does it Take to Generalize SER Model Across Datasets? A Comprehensive Benchmark.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NainiGKRRB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NainiGKRRB24
Abinay Reddy Naini, Lucas Goncalves, Mary A. Kohler, Donita Robinson, Elizabeth Richerson, Carlos Busso:
WHiSER: White House Tapes Speech Emotion Recognition Corpus.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LatifJS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LatifJS24
Siddique Latif, Raja Jurdak, Björn W. Schuller:
Evaluating Transformer-Enhanced Deep Reinforcement Learning for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZ0TLZZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZ0TLZZ24
Jincen Wang, Yan Zhao, Cheng Lu, Chuangao Tang, Sunan Li, Yuan Zong, Wenming Zheng:
Boosting Cross-Corpus Speech Emotion Recognition using CycleGAN with Contrastive Learning.

Speaker and Language Identification and Diarization

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RahouB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RahouB24
Bilal Rahou, Hervé Bredin:
Multi-latency look-ahead for streaming speaker segmentation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BoeddekerCH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BoeddekerCH24
Christoph Boeddeker, Tobias Cord-Landwehr, Reinhold Haeb-Umbach:
Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MariotteLMT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MariotteLMT24
Théo Mariotte, Anthony Larcher, Silvio Montrésor, Jean-Hugh Thomas:
ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PirlogeanuPGC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PirlogeanuPGC24
Gabriel Pirlogeanu, Octavian Pascu, Alexandru-Lucian Georgescu, Horia Cucu:
Hybrid-Diarization System with Overlap Post-Processing for the DISPLACE 2024 Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KalluriSCKBHSTP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KalluriSCKBHSTP24
Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K. T., S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy:
The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KaldaALBBM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KaldaALBBM24
Joonas Kalda, Tanel Alumäe, Martin Lebourdais, Hervé Bredin, Séverin Baroudi, Ricard Marxer:
TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 2024.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaoHGZF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaoHGZF24
Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng:
Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ValenteBMZB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ValenteBMZB24
Martina Valente, Fabio Brugnara, Giovanni Morrone, Enrico Zovato, Leonardo Badino:
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PaturiLS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PaturiLS24
Rohit Paturi, Xiang Li, Sundararajan Srinivasan:
AG-LSEC: Audio Grounded Lexical Speaker Error Correction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuKF0W024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuKF0W024
Hang Su, Yuxiang Kong, Lichun Fan, Peng Gao, Yujun Wang, Zhiyong Wu:
Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MakishimaKITOAM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MakishimaKITOAM24
Naoki Makishima, Naotaka Kawata, Mana Ihori, Tomohiro Tanaka, Shota Orihashi, Atsushi Ando, Ryo Masumura:
SOMSRED: Sequential Output Modeling for Joint Multi-talker Overlapped Speech Recognition and Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunakataTF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunakataTF24
Hokuto Munakata, Ryo Terashima, Yusuke Fujita:
Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework.

Audio-Text Retrieval

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XinCZYZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XinCZYZ24
Yifei Xin, Xuxin Cheng, Zhihong Zhu, Xusheng Yang, Yuexian Zou:
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YanDWLZWW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YanDWLZWW24
Zhiyong Yan, Heinrich Dinkel, Yongqing Wang, Jizhong Liu, Junbo Zhang, Yujun Wang, Bin Wang:
Bridging Language Gaps in Audio-Text Retrieval.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DeshmukhSR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DeshmukhSR24
Soham Deshmukh, Rita Singh, Bhiksha Raj:
Domain Adaptation for Contrastive Audio-Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PaissanF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PaissanF24
Francesco Paissan, Elisabetta Farella:
tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimTCMJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimTCMJ24
June-Woo Kim, Miika Toikkanen, Yera Choi, Seoung-Eun Moon, Ho-Young Jung:
BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TangMZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TangMZ24
Yuwu Tang, Ziang Ma, Haitao Zhang:
Enhanced Feature Learning with Normalized Knowledge Distillation for Audio Tagging.

Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuCYLXHX024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCYLXHX024
Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie:
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuH24
Xi Liu, John H. L. Hansen:
DNN-based monaural speech enhancement using alternate analysis windows for phase and magnitude modification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0063S24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0063S24
Li Li, Shogo Seki:
Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZYY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZYY24
Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu:
Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXT0Z24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXT0Z24
Jizhen Li, Xinmeng Xu, Weiping Tu, Yuhong Yang, Rong Zhu:
Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZ0A024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZ0A024
Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li:
An Exploration of Length Generalization in Transformer-Based Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuanDWTLL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuanDWTLL24
Haixin Guan, Wei Dai, Guangyong Wang, Xiaobin Tan, Peng Li, Jiaen Liang:
Reducing Speech Distortion and Artifacts for Speech Enhancement by Loss Function.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MawalimOU24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MawalimOU24
Candy Olivia Mawalim, Shogo Okada, Masashi Unoki:
Are Recent Deep Learning-Based Speech Enhancement Methods Ready to Confront Real-World Noisy Environments?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangSJL0Q24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangSJL0Q24
Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian:
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement.

Speech Coding

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZLL0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZLL0024
Jinghong Zhang, Zugang Zhao, Yonghui Liu, Jianbing Liu, Zhiqiang He, Kai Niu:
TD-PLC: A Semantic-Aware Speech Encoding for Improved Packet Loss Concealment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangXHX024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangXHX024
Zihan Zhang, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie:
BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuptaPKBFM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuptaPKBFM24
Kishan Gupta, Nicola Pia, Srikanth Korse, Andreas Brendel, Guillaume Fuchs, Markus Multrus:
On Improving Error Resilience of Neural End-to-End Speech Coders.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MullerRGPS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MullerRGPS24
Thomas Muller, Stéphane Ragot, Laetitia Gros, Pierrick Philippe, Pascal Scalart:
Speech quality evaluation of neural audio codecs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AiLJSZL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AiLJSZL24
Yang Ai, Ye-Xin Lu, Xiao-Hang Jiang, Zheng-Yan Sheng, Rui-Chen Zheng, Zhen-Hua Ling:
A Low-Bitrate Neural Audio Codec Framework with Bandwidth Reduction and Recovery for High-Sampling-Rate Waveforms.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuTL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuTL24
Haibin Wu, Yuan Tseng, Hung-yi Lee:
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems.

Speech Synthesis: Expressivity and Emotion

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiCZLD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiCZLD24
Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan:
GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SeongLC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SeongLC24
Donghyun Seong, Hoyoung Lee, Joon-Hyuk Chang:
TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiYZ0W0WM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiYZ0W0WM24
Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng:
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoZXZTX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoZXZTX24
Dake Guo, Xinfa Zhu, Liumeng Xue, Yongmao Zhang, Wenjie Tian, Lei Xie:
Text-aware and Context-aware Expressive Audiobook Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BottLV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BottLV24
Thomas Bott, Florian Lux, Ngoc Thang Vu:
Controlling Emotion in Text-to-Speech with Natural Language Prompts.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XueDGL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XueDGL24
Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li:
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KalyanRJB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KalyanRJB24
Pavan Kalyan, Preeti Rao, Preethi Jyothi, Pushpak Bhattacharyya:
Emotion Arithmetic: Emotional Speech Synthesis via Weight Space Interpolation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoOKLL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoOKLL24
Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee:
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiSSHLZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiSSHLZ24
Xuyuan Li, Zengqiang Shang, Peiyang Shi, Hua Hua, Ta Li, Pengyuan Zhang:
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuF24
Chin-Yun Yu, György Fazekas:
Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis.

Speech Synthesis: Tools and Data

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaitoISTYTS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaitoISTYTS24
Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, Hiroshi Saruwatari:
SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VaradhanSRK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VaradhanSRK24
Praveen Srinivasa Varadhan, Ashwin Sankar, Giri Raju, Mitesh M. Khapra:
Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaKKZRIB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaKKZRIB24
Min Ma, Yuma Koizumi, Shigeki Karita, Heiga Zen, Jason Riesa, Haruko Ishikawa, Michiel Bacchiani:
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaGSJWXXZZX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaGSJWXXZZX24
Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie:
WenetSpeech4TTS: A 12, 800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangZS0L0C0H24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangZS0L0C0H24
Qian Yang, Jialong Zuo, Zhe Su, Ziyue Jiang, Mingze Li, Zhou Zhao, Feiyang Chen, Zhefeng Wang, Baoxing Huai:
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KawamuraYSHT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KawamuraYSHT24
Masaya Kawamura, Ryuichi Yamamoto, Yuma Shirahata, Takuya Hasumi, Kentaro Tachibana:
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OgunOOAOAOEA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OgunOOAOAOEA24
Sewade Ogun, Abraham Toluwase Owodunni, Tobi Olatunji, Eniola Alese, Babatunde Oladimeji, Tejumade Afonja, Kayode Olaleye, Naome A. Etori, Tosin P. Adewumi:
1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakeTSBS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakeTSBS24
Osamu Take, Shinnosuke Takamichi, Kentaro Seki, Yoshiaki Bando, Hiroshi Saruwatari:
SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis.

Speech Synthesis: Singing Voice Synthesis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimJLKCK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimJLKCK24
Semin Kim, Myeonghun Jeong, Hyeonseung Lee, Minchan Kim, Byoung Jin Choi, Nam Soo Kim:
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OkamotoOSTK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OkamotoOSTK24
Takuma Okamoto, Yamato Ohtani, Sota Shimizu, Tomoki Toda, Hisashi Kawai:
Challenge of Singing Voice Synthesis Using Only Text-To-Speech Corpus With FIRNet Source-Filter Neural Vocoder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimCL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimCL24
Taewoo Kim, Choongsang Cho, Young Han Lee:
Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiLBZWTYJ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiLBZWTYJ024
Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe:
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HwangNHO24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HwangNHO24
Ji-Sang Hwang, HyeongRae Noh, Yoonseok Hong, Insoo Oh:
X-Singer: Code-Mixed Singing Voice Synthesis via Cross-Lingual Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoLZLLTHKW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoLZLLTHKW24
Shuochen Gao, Shun Lei, Fan Zhuo, Hangyu Liu, Feng Liu, Boshi Tang, Qiaochu Huang, Shiyin Kang, Zhiyong Wu:
An End-to-End Approach for Chord-Conditioned Song Generation.

LLM in ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaskarRRGM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaskarRRGM24
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Neeraj Gaur, Zhong Meng:
Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SeideSDGJW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SeideSDGJW24
Frank Seide, Yangyang Shi, Morrie Doulaty, Yashesh Gaur, Junteng Jia, Chunyang Wu:
Speech ReaLLM - Real-time Speech Recognition with Multimodal Language Models by Teaching the Flow of Time.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWCZMX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWCZMX24
Yangze Li, Xiong Wang, Songjun Cao, Yike Zhang, Long Ma, Lei Xie:
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TangWHS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TangWHS24
Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang:
Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models.

Vision and Speech

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimS024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimS024
Jongsuk Kim, Jiwon Shin, Junmo Kim:
AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhoshKSCTDM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhoshKSCTDM24
Sreyan Ghosh, Sonal Kumar, Ashish Seth, Purva Chiniya, Utkarsh Tyagi, Ramani Duraiswami, Dinesh Manocha:
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiDCGHLJWXML24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiDCGHLJWXML24
Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui Jin, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu:
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenLLLW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenLLLW24
Chen Chen, Zehua Liu, Xiaolou Li, Lantian Li, Dong Wang:
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge.

Spoken Document Summarization

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KrollK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KrollK24
Margaret Kroll, Kelsey Kraus:
Optimizing the role of human evaluation in LLM-based spoken document summarization systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RyuD0LO24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RyuD0LO24
Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok:
Key-Element-Informed sLLM Tuning for Document Summarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MatsuuraAMMKOD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MatsuuraAMMKOD24
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Masato Mimura, Takatomo Kano, Atsunori Ogawa, Marc Delcroix:
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShangLGLRLWY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShangLGLRLWY24
Hengchao Shang, Zongyao Li, Jiaxin Guo, Shaojun Li, Zhiqiang Rao, Yuanchang Luo, Daimeng Wei, Hao Yang:
An End-to-End Speech Summarization Using Large Language Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangR24
Wonjune Kang, Deb Roy:
Prompting Large Language Models with Audio for General-Purpose Speech Summarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Le-DucNVH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Le-DucNVH24
Khai Le-Duc, Khai-Nguyen Nguyen, Long Vo-Dang, Truong-Son Hy:
Real-time Speech Summarization for Medical Conversations.

Speech and Language in Health: from Remote Monitoring to Medical Conversations - 2 (Special Sessions)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Escobar-Grisales24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Escobar-Grisales24
Daniel Escobar-Grisales, Cristian David Ríos-Urrego, Ilja Baumann, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet, Adolfo M. García, Juan Rafael Orozco-Arroyave:
It's Time to Take Action: Acoustic Modeling of Motor Verbs to Detect Parkinson's Disease.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaisonneuveFLGW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaisonneuveFLGW24
Malo Maisonneuve, Corinne Fredouille, Muriel Lalain, Alain Ghio, Virginie Woisard:
Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BotelhoMPSAT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BotelhoMPSAT24
Catarina Botelho, John Mendonça, Anna Pompili, Tanja Schultz, Alberto Abad, Isabel Trancoso:
Macro-descriptors for Alzheimer's disease detection using large language models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BraunBHLHBR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BraunBHLHBR24
Franziska Braun, Sebastian P. Bayerl, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer:
Infusing Acoustic Pause Context into Text-Based Dementia Assessment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoeslerLNKHAHSB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoeslerLNKHAHSB24
Oliver Roesler, Jackson Liscombe, Michael Neumann, Hardik Kothare, Abhishek Hosamath, Lakshmi Arbatti, Doug Habberstad, Christiane Suendermann-Oeft, Meredith Bartlett, Cathy Zhang, Nikhil Sukhdev, Kolja Wilms, Anusha Badathala, Sandrine Istas, Steve Ruhmel, Bryan Hansen, Madeline Hannan, David Henley, Arthur Wallace, Ira Shoulson, David Suendermann-Oeft, Vikram Ramanarayanan:
Towards Scalable Remote Assessment of Mild Cognitive Impairment Via Multimodal Dialog.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BarberisCThV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BarberisCThV24
Mara Barberis, Pieter De Clercq, Bastiaan Tamm, Hugo Van hamme, Maaike Vandermosten:
Automatic recognition and detection of aphasic natural speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SanguedolceBGNG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SanguedolceBGNG24
Giulia Sanguedolce, Sophie Brook, Dragos-Cristian Gruia, Patrick A. Naylor, Fatemeh Geranmayeh:
When Whisper Listens to Aphasia: Advancing Robust Post-Stroke Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang0DVPTGPVG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang0DVPTGPVG24
Liming Wang, Yuan Gong, Nauman Dawalatabad, Marco Vilela, Katerina Placek, Brian Tracey, Yishu Gong, Alan Premasiri, Fernando Vieira, James R. Glass:
Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KothareNZLUBBER24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KothareNZLUBBER24
Hardik Kothare, Michael Neumann, Cathy Zhang, Jackson Liscombe, Jordi W. J. van Unnik, Lianne C. M. Botman, Leonard H. van den Berg, Ruben P. A van Eijk, Vikram Ramanarayanan:
How Consistent are Speech-Based Biomarkers in Remote Tracking of ALS Disease Progression Across Languages? A Case Study of English and Dutch.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SpiesbergerTKSG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SpiesbergerTKSG24
Anika A. Spiesberger, Andreas Triantafyllopoulos, Alexander Kathan, Anastasia Semertzidou, Caterina Gawrilow, Tilman Reinelt, Wolfgang A. Rauch, Björn W. Schuller:
"So . . . my child . . . " - How Child ADHD Influences the Way Parents Talk.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DineleyCWLR0MDD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DineleyCWLR0MDD24
Judith Dineley, Ewan Carr, Lauren L. White, Catriona Lucas, Zahia Rahman, Tian Pan, Faith Matcham, Johnny Downs, Richard J. B. Dobson, Thomas F. Quatieri, Nicholas Cummins:
Variability of speech timing features across repeated recordings: a comparison of open-source extraction techniques.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LabrakMDR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LabrakMDR24
Yanis Labrak, Adel Moumen, Richard Dufour, Mickael Rouvier:
Zero-Shot End-To-End Spoken Question Answering In Medical Domain.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangWX0SYCWL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangWX0SYCWL024
Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui Chen, Lan Wang, Xunying Liu, Feng Tian:
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition.

Show and Tell 2

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/VDNV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VDNV24
Kesavaraj V, Charan Devarkonda, Vamshiraghusimha Narasinga, Anil Kumar Vuppala:
Custom wake word detection.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/0005GDKHDA024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0005GDKHDA024
Song Chen, Mandar Gogate, Kia Dashtipour, Jasper Kirton-Wingate, Adeel Hussain, Faiyaz Doctor, Tughrul Arslan, Amir Hussain:
Edged based audio-visual speech enhancement demonstrator.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/AnwayBGDA024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AnwayBGDA024
Arif Reza Anway, Bryony Buck, Mandar Gogate, Kia Dashtipour, Michael Akeroyd, Amir Hussain:
Real-Time Gaze-directed speech enhancement for audio-visual hearing-aids.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/KumarKV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarKV24
Abhishek Kumar, Srikanth Konjeti, Jithendra Vepa:
Detection of background agents speech in contact centers.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/KoilakuntlaRAKV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoilakuntlaRAKV24
Bramhendra Koilakuntla, Prajesh Rana, Paras Ahuja, Srikanth Konjeti, Jithendra Vepa:
Leveraging large language models for post-transcription correction in contact centers.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/SchadeDTLW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchadeDTLW24
Leonie Schade, Nico Dallmann, Olcay Türk, Stefan Lazarov, Petra Wagner:
Understanding "understanding": presenting a richly annotated multimodal corpus of dyadic interaction.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/MenezesFDKB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MenezesFDKB24
João Vítor Possamai de Menezes, Arne-Lukas Fietkau, Tom Diener, Steffen Kürbis, Peter Birkholz:
A demonstrator for articulation-based command word recognition.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/WardS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WardS24
Nigel G. Ward, Andres Segura:
Pragmatically similar utterance finder demonstration.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/LiuD0WZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuD0WZ24
Kai Liu, Ziqing Du, Huan Zhou, Xucheng Wan, Naijun Zheng:
Real-time scheme for rapid extraction of speaker embeddings in challenging recording conditions.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ChenLSTC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenLSTC24
Szu-Yu Chen, Tien-Hong Lo, Yao-Ting Sung, Ching-Yu Tseng, Berlin Chen:
TEEMI: a speaking practice tool for L2 English learners.

Prosody

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuSA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuSA24
Na Hu, Hugo Schnack, Amalia Arvaniti:
Automatic pitch accent classification through image classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GengF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GengF24
Tianqi Geng, Hui Feng:
Form and Function in Prosodic Representation: In the Case of 'ma' in Tianjin Mandarin.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChakrabortyDS024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChakrabortyDS024
Joyshree Chakraborty, Leena Dihingia, Priyankoo Sarmah, Rohit Sinha:
On Comparing Time- and Frequency-Domain Rhythm Measures in Classifying Assamese Dialects.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RieggerBW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RieggerBW24
Chiara Riegger, Tina Bögel, George Walkden:
The prosody of the verbal prefix ge-: historical and experimental evidence.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuY24
Hongchen Wu, Jiwon Yun:
Influences of Morphosyntax and Semantics on the Intonation of Mandarin Chinese Wh-indeterminates.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MumtazB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MumtazB24
Benazir Mumtaz, Miriam Butt:
Urdu Alternative Questions: A Hat Pattern.

Foundational Models for Deepfake and Spoofed Speech Detection

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TranGMSLDM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TranGMSLDM24
Hoan My Tran, David Guennec, Philippe Martin, Aghilas Sini, Damien Lolive, Arnaud Delhay, Pierre-François Marteau:
Spoofed Speech Detection with a Focus on Speaker Embedding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Martin-DonasARG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Martin-DonasARG24
Juan M. Martín-Doñas, Aitor Álvarez, Eros Rosello, Angel M. Gomez, Antonio M. Peinado:
Exploring Self-supervised Embeddings and Synthetic Data Augmentation for Robust Audio Deepfake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Pan0SW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Pan0SW24
Zihan Pan, Tianchi Liu, Hardik B. Sailor, Qiongqiong Wang:
Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuGPLZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuGPLZ24
Haochen Wu, Wu Guo, Shengyu Peng, Zhuhai Li, Jie Zhang:
Adapter Learning from Pre-trained Model for Robust Spoof Speech Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuWLS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuWLS24
Kexu Liu, Yuanxin Wang, Shengchen Li, Xi Shao:
Speech Formants Integration for Generalized Detection of Synthetic Speech Spoofing Attacks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoanNHJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoanNHJ24
Thien-Phuc Doan, Long Nguyen-Vu, Kihun Hong, Souhwan Jung:
Balance, Multiple Augmentation, and Re-synthesis: A Triad Training Strategy for Enhanced Audio Deepfake Detection.

Speaker Recognition 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengGWLZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengGWLZ24
Shengyu Peng, Wu Guo, Haochen Wu, Zuoliang Li, Jie Zhang:
Fine-tune Pre-Trained Models with Multi-Level Feature Fusion for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuHHHC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuHHHC24
En-Lun Yu, Kuan-Hsun Ho, Jeih-weih Hung, Shih-Chieh Huang, Berlin Chen:
Speaker Conditional Sinc-Extractor for Personal VAD.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiouCCLLCCC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiouCCLLCCC24
Shiu-Hsiang Liou, Po-Cheng Chan, Chia-Ping Chen, Tzu-Chieh Lin, Chung-Li Lu, Yu-Han Cheng, Hsiang-Feng Chuang, Wei-Yu Chen:
Enhancing ECAPA-TDNN with Feature Processing Module and Attention Mechanism for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimLHKSKY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimLHKSKY24
Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-Jin Yu:
MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NamHJC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NamHJC24
Kihyun Nam, Hee-Soo Heo, Jee-Weon Jung, Joon Son Chung:
Disentangled Representation Learning for Environment-agnostic Speaker Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MosnerSBP0PC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MosnerSBP0PC24
Ladislav Mosner, Romain Serizel, Lukás Burget, Oldrich Plchot, Emmanuel Vincent, Junyi Peng, Jan Cernocký:
Multi-Channel Extension of Pre-trained Models for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGHMSLH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGHMSLH24
Yishuang Li, Wenhao Guan, Hukai Huang, Shiyu Miao, Qi Su, Lin Li, Qingyang Hong:
Efficient Integrated Features Based on Pre-trained Models for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLW24
Tianhao Wang, Lantian Li, Dong Wang:
SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieXSZS024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieXSZS024
Wei-Lin Xie, Yu-Xuan Xi, Yan Song, Jian-Tao Zhang, Hao-Yu Song, Ian McLoughlin:
DB-PMAE: Dual-Branch Prototypical Masked AutoEncoder with locality for domain robust speaker verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaciejewskiKHWK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaciejewskiKHWK24
Matthew Maciejewski, Dominik Klement, Ruizhe Huang, Matthew Wiesner, Sanjeev Khudanpur:
Evaluating the Santa Barbara Corpus: Challenges of the Breadth of Conversational Spoken Language.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouXYLW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouXYLW24
Zhenyu Zhou, Shibiao Xu, Shi Yin, Lantian Li, Dong Wang:
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition.

Source Separation 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang0MHTD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang0MHTD24
Helin Wang, Jesús Villalba, Laureano Moro-Velázquez, Jiarui Hai, Thomas Thebaud, Najim Dehak:
Noise-robust Speech Separation with Fast Generative Correction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HartantoSS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HartantoSS24
Roland Hartanto, Sakriani Sakti, Koichi Shinoda:
MSDET: Multitask Speaker Separation and Direction-of-Arrival Estimation Training.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KealeyHG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KealeyHG24
Jacob Kealey, John R. Hershey, François Grondin:
Unsupervised Improved MVDR Beamforming for Sound Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0021SBDJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0021SBDJ24
Ke Chen, Jiaqi Su, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Zeyu Jin:
Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimKMK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimKMK24
Jihyun Kim, Stijn Kindt, Nilesh Madhu, Hong-Goo Kang:
Enhanced Deep Speech Separation in Clustered Ad Hoc Distributed Microphone Environments.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YipZNC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YipZNC024
Jia Qi Yip, Shengkui Zhao, Dianwen Ng, Eng Siong Chng, Bin Ma:
Towards Audio Codec-based Speech Separation.

Audio-Visual and Generative Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiI24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiI24
Zhengxiao Li, Nakamasa Inoue:
Locally Aligned Rectified Flow Model for Speech Enhancement Towards Single-Step Diffusion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLLGZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLLGZ24
Pu Wang, Junhui Li, Jialu Li, Liangdong Guo, Youshan Zhang:
Diffusion Gaussian Mixture Audio Denoise.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LayG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LayG24
Bunlong Lay, Timo Gerkmann:
An Analysis of the Variance of Diffusion-based Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungLKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungLKC24
Chaeyoung Jung, Suyeon Lee, Ji-Hoon Kim, Joon Son Chung:
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenMPP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenMPP24
Honglie Chen, Rodrigo Mira, Stavros Petridis, Maja Pantic:
RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWLZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWLZ24
Junhui Li, Pu Wang, Jialu Li, Youshan Zhang:
Complex Image-Generative Diffusion Transformer for Audio Denoising.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Hu0LZC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Hu0LZC24
Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng:
Noise-aware Speech Enhancement using Diffusion Probabilistic Model.

Speech Privacy and Bandwidth Expansion

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ValiB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ValiB24
Mohammad Hassan Vali, Tom Bäckström:
Privacy PORCUPINE: Anonymization of Speaker Attributes Using Occurrence Normalization for Space-Filling Vector Quantization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghTLM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghTLM24
Mayank Kumar Singh, Naoya Takahashi, Wei-Hsiang Liao, Yuki Mitsufuji:
SilentCipher: Deep Audio Watermarking.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanDXCYL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanDXCYL24
Cunhang Fan, Shunbo Dong, Jun Xue, Yujie Chen, Jiangyan Yi, Zhao Lv:
Frequency-mix Knowledge Distillation for Fake Speech Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MullerKHN0SB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MullerKHN0SB24
Nicolas M. Müller, Piotr Kawa, Shen Hu, Matthias Neu, Jennifer Williams, Philip Sperl, Konstantin Böttinger:
A New Approach to Voice Authenticity.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouY00BZRW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouY00BZRW24
Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen:
TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuWLF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuWLF24
Liwei Liu, Huihui Wei, Dongya Liu, Zhonghua Fu:
HarmoNet: Partial DeepFake Detection Network based on Multi-scale HarmoF0 Feature Fusion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoussaBR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoussaBR24
Denise Moussa, Sandra Bergmann, Christian Riess:
Unmasking Neural Codecs: Forensic Identification of AI-compressed Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinUSL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinUSL24
Yin-Tse Lin, Shreya G. Upadhyay, Bo-Hao Su, Chi-Chun Lee:
SWiBE: A Parameterized Stochastic Diffusion Process for Noise-Robust Bandwidth Expansion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuASL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuASL24
Ye-Xin Lu, Yang Ai, Zheng-Yan Sheng, Zhen-Hua Ling:
MultiStage Speech Bandwidth Extension with Flexible Sampling Rate Control.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWL24
Xu Li, Qirui Wang, Xiaoyu Liu:
MaskSR: Masked Language Model for Full-band Speech Restoration.

Speech Synthesis: Prosody

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KorotkovaKV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KorotkovaKV24
Yuliya Korotkova, Ilya Kalinovskiy, Tatiana Vakhrusheva:
Word-level Text Markup for Prosody Control in Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MehtaLPBSH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MehtaLPBSH24
Shivam Mehta, Harm Lameris, Rajiv Punmiya, Jonas Beskow, Éva Székely, Gustav Eje Henter:
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EskimezWTTLXYZT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EskimezWTTLXYZT24
Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Jinyu Li, Sheng Zhao, Naoyuki Kanda:
Total-Duration-Aware Duration Modeling for Text-to-Speech Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MauryaS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MauryaS24
Himanshu Maurya, Atli Sigurgeirsson:
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangLYXMW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangLYXMW24
Yuepeng Jiang, Tao Li, Fengyu Yang, Lei Xie, Meng Meng, Yujun Wang:
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhongLHRLSGTZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhongLHRLSGTZ24
Jinzuomu Zhong, Yang Li, Hui Huang, Korin Richmond, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu:
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of Speech-Silence and Word-Punctuation.

Accented Speech, Prosodic Features, Dialect, Emotion, Sound Classification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PrabhuGNJG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrabhuGNJG24
Darshan Prabhu, Abhishek Gupta, Omkar Nitsure, Preethi Jyothi, Sriram Ganapathy:
Improving Self-supervised Pre-training using Accent-Specific Codebooks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AfonjaOOEOY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AfonjaOOEOY24
Tejumade Afonja, Tobi Olatunji, Sewade Ogun, Naome A. Etori, Abraham Toluwase Owodunni, Moshood Yekini:
Performant ASR Models for Medical Entities in Accented Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JavedNJGBMK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JavedNJGBMK24
Tahir Javed, Janki Nawale, Sakshi Joshi, Eldho Ittan George, Kaushal Santosh Bhogale, Deovrat Mehendale, Mitesh M. Khapra:
LAHAJA: A Robust Multi-accent Benchmark for Evaluating Hindi ASR Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimMKLKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimMKLKK24
Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim:
LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinLHGXBHL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinLHGXBHL24
Jiayan Lin, Shenghui Lu, Hukai Huang, Wenhao Guan, Binbin Xu, Hui Bu, Qingyang Hong, Lin Li:
MinSpeech: A Corpus of Southern Min Dialect for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0005Y0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0005Y0024
Ying Hu, Huamin Yang, Hao Huang, Liang He:
Cross-modal Features Interaction-and-Aggregation Network with Self-consistency Training for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoelHG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoelHG24
Arnav Goel, Medha Hira, Anubha Gupta:
Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BukhariDDRS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BukhariDDRS24
Hazim T. Bukhari, Soham Deshmukh, Hira Dhamyal, Bhiksha Raj, Rita Singh:
SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BentumBL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BentumBL24
Martijn Bentum, Louis ten Bosch, Tom Lentz:
The Processing of Stress in End-to-End Automatic Speech Recognition Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenT24
Tuan Nguyen, Huy Dat Tran:
LingWav2Vec2: Linguistic-augmented wav2vec 2.0 for Vietnamese Mispronunciation Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MogridgeR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MogridgeR24
Rhiannon Mogridge, Anton Ragni:
Learning from memory-based models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenLYW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenLYW24
Meiling Chen, Pengjie Liu, Heng Yang, Haofeng Wang:
Towards End-to-End Unified Recognition for Mandarin and Cantonese.

Neural Network Adaptation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RollandA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RollandA24
Thomas Rolland, Alberto Abad:
Shared-Adapters: A Novel Transformer-based Parameter Efficient Transfer Learning Approach For Children's Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuoHSSW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuoHSSW24
Zhouyuan Huo, Dongseong Hwang, Gan Song, Khe Chai Sim, Weiran Wang:
AdaRA: Adaptive Rank Allocation of Residual Adapters for Speech Foundation Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shim0K24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shim0K24
Kyuhong Shim, Jinkyu Lee, Hyunjae Kim:
Leveraging Adapter for Parameter-Efficient ASR Encoder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangLLC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangLLC24
Ji-Hun Kang, Jae-Hong Lee, Mun-Hak Lee, Joon-Hyuk Chang:
Whisper Multilingual Downstream Task Tuning Using Task Vectors.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWSGLWRLH024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWSGLWRLH024
Shaojun Li, Daimeng Wei, Hengchao Shang, Jiaxin Guo, Zongyao Li, Zhanglin Wu, Zhiqiang Rao, Yuanchang Luo, Xianghui He, Hao Yang:
Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenFZWF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenFZWF24
Jinming Chen, Jingyi Fang, Yuanzhong Zheng, Yaoxuan Wang, Haojun Fei:
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition.

ASR and LLMs

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonWK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonWK24
Ji Won Yoon, Beom Jun Woo, Nam Soo Kim:
HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangMYGZ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangMYGZ024
Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen:
MaLa-ASR: Multimedia-Assisted LLM-Based ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiCLLKYKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiCLLKYKK24
HyunJung Choi, Muyeol Choi, Yohan Lim, Minkyu Lee, Seon Hui Kim, Seung Yun, Donghyun Kim, Sang-Hun Kim:
Spoken-to-written text conversion with Large Language Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AiCX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AiCX24
Zhiqi Ai, Zhiyong Chen, Shugong Xu:
MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Rouditchenko00K24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Rouditchenko00K24
Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogério Feris, James Glass:
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PrajwalAZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrajwalAZ24
K. R. Prajwal, Triantafyllos Afouras, Andrew Zisserman:
Speech Recognition Models are Strong Lip-readers.

Pathological Speech Analysis 3

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Baumann0SRNB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Baumann0SRNB24
Ilja Baumann, Dominik Wagner, Maria Schuster, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet:
Towards Self-Attention Understanding for Automatic Articulatory Processes Analysis in Cleft Lip and Palate Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuFYL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuFYL24
Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling:
Clever Hans Effect Found in Automatic Detection of Alzheimer's Disease through Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinTJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinTJ24
Yeh-Sheng Lin, Shu-Chuan Tseng, Jyh-Shing Roger Jang:
Leveraging Phonemic Transcription and Whisper toward Clinically Significant Indices for Automatic Child Speech Assessment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DangMTTTNMSKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DangMTTTNMSKK24
Shaoxiang Dang, Tetsuya Matsumoto, Yoshinori Takeuchi, Takashi Tsuboi, Yasuhiro Tanaka, Daisuke Nakatsubo, Satoshi Maesawa, Ryuta Saito, Masahisa Katsuno, Hiroaki Kudo:
Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HsuCLL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HsuCLL24
Wei-Tung Hsu, Chin-Po Chen, Yun-Shao Lin, Chi-Chun Lee:
A Cluster-based Personalized Federated Learning Strategy for End-to-End ASR of Dementia Patients.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KalabakovMESA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KalabakovMESA24
Stefan Kalabakov, Monica González Machorro, Florian Eyben, Björn W. Schuller, Bert Arnrich:
A Comparative Analysis of Federated Learning for Speech-Based Cognitive Decline Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NeumannKLLRR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NeumannKLLRR24
Michael Neumann, Hardik Kothare, Jackson Liscombe, Emma C. L. Leschly, Oliver Roesler, Vikram Ramanarayanan:
Multimodal Digital Biomarkers for Longitudinal Tracking of Speech Impairment Severity in ALS: An Investigation of Clinically Important Differences.

Speech Disorders 3

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoCDXGBYL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoCDXGBYL024
Ming Gao, Hang Chen, Jun Du, Xin Xu, Hongxiao Guo, Hui Bu, Jianxing Yang, Ming Li, Chin-Hui Lee:
Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShahKG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShahKG24
Neil Kumar Shah, Shirish S. Karande, Vineet Gandhi:
Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UmKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UmKK24
Seyun Um, Doyeon Kim, Hong-Goo Kang:
PARAN: Variational Autoencoder-based End-to-End Articulation-to-Speech System for Speech Intelligibility.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWHZCTLWCLC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWHZCTLWCLC24
Si Chen, Bruce Xiao Wang, Yitian Hong, Fang Zhou, Angel Chan, Po-yi Tang, Bin Li, Chunyi Wen, James Cheung, Yan Liu, Zhuoming Chen:
Acoustic changes in speech prosody produced by children with autism after robot-assisted speech training.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0003PH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0003PH24
Xiuwen Zheng, Bornali Phukon, Mark Hasegawa-Johnson:
Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangTTMSCLHG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangTTMSCLHG24
Pan-Pan Jiang, Jimmy Tobin, Katrin Tomanek, Robert L. MacDonald, Katie Seaver, Richard Cave, Marilyn A. Ladewig, Rus Heywood, Jordan R. Green:
Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeungCRG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeungCRG24
Wing-Zin Leung, Mattias Cross, Anton Ragni, Stefan Goetze:
Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GosztolyaVSBH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GosztolyaVSBH24
Gábor Gosztolya, Mercedes Vetráb, Veronika Svindt, Judit Bóna, Ildikó Hoffmann:
Wav2vec 2.0 Embeddings Are No Swiss Army Knife - A Case Study for Multiple Sclerosis.

Speech Recognition with Large Pretrained Speech Models for Under-represented Languages (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShihH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShihH24
Yi-Jen Shih, David Harwath:
Interface Design for Self-Supervised Speech Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Xu00W024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Xu00W024
Yaoxun Xu, Shi-Xiong Zhang, Jianwei Yu, Zhiyong Wu, Dong Yu:
Comparing Discrete and Continuous Space LLMs for Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiPSZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiPSZ24
Jinpeng Li, Yu Pu, Qi Sun, Wei-Qiang Zhang:
Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhogaleMPGJKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhogaleMPGJKK24
Kaushal Santosh Bhogale, Deovrat Mehendale, Niharika Parasa, Sathish Kumar Reddy G, Tahir Javed, Pratyush Kumar, Mitesh M. Khapra:
Empowering Low-Resource Language ASR via Large-Scale Pseudo Labeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiBLGLKF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiBLGLKF24
Zhengyang Li, Patrick Blumenberg, Jing Liu, Thomas Graave, Timo Lohrenz, Siegfried Kunzmann, Tim Fingscheidt:
Interleaved Audio/Audiovisual Transfer Learning for AV-ASR in Low-Resourced Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UdupaBKGBMPMNG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UdupaBKGBMPMNG24
Sathvik Udupa, Jesuraj Bandekar, Saurabh Kumar, Deekshitha G, Sandhya Badiger, Abhayjeet Singh Savitha Murthy, Priyanka Pai, Srinivasa Raghavan K. M., Raoul Nanavati, Prasanta Kumar Ghosh:
Adapter pre-training for improved speech recognition in unseen domains using low resource adapter tuning of self-supervised models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuHGZHXX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuHGZHXX24
Tianyi Xu, Kaixun Huang, Pengcheng Guo, Yu Zhou, Longtao Huang, Hui Xue, Lei Xie:
Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GetmanGHK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GetmanGHK24
Yaroslav Getman, Tamás Grósz, Katri Hiovain-Asikainen, Mikko Kurimo:
Exploring adaptation techniques of large speech foundation models for low-resource ASR: a case study on Northern Sámi.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QianTMKG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QianTMKG24
Mengjie Qian, Siyuan Tang, Rao Ma, Kate M. Knill, Mark J. F. Gales:
Learn and Don't Forget: Adding a New Language to ASR Foundation Models.

Speech Processing Using Discrete Speech Units (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuZSTYJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuZSTYJ24
Yuning Wu, Chunlei Zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin Jin:
TokSing: Singing Voice Synthesis based on Discrete Tokens.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MousaviDZLPSR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MousaviDZLPSR24
Pooneh Mousavi, Jarod Duret, Salah Zaiem, Luca Della Libera, Artem Ploujnikov, Cem Subakan, Mirco Ravanelli:
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangSTWTW0A0J24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangSTWTW0A0J24
Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin:
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TangWSJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TangWSJ24
Yuxun Tang, Yuning Wu, Jiatong Shi, Qin Jin:
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiMIS024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiMIS024
Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, Shinji Watanabe:
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DhawanKJLBG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DhawanKJLBG24
Kunal Dhawan, Nithin Rao Koluguri, Ante Jukic, Ryan Langman, Jagadeesh Balam, Boris Ginsburg:
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations.

Keynote 3

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Noth24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Noth24
Elmar Nöth:
Analysis of Pathological Speech - Pitfalls along the Way.

Databases and Progress in Methodology

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnKCKKMC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnKCKKMC24
Junseok Ahn, Youkyum Kim, Yeunju Choi, Doyeop Kwak, Ji-Hoon Kim, Seongkyu Mun, Joon Son Chung:
VoxSim: A perceptual voice similarity dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NijatCWH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NijatCWH24
Mewlude Nijat, Chen Chen, Dong Wang, Askar Hamdulla:
UY/CH-CHILD - A Public Chinese L2 Speech Database of Uyghur Children.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KumarTLCHBSNN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarTLCHBSNN24
Prakash Kumar, Ye Tian, Yongwan Lim, Sophia X. Cui, Christina Hagedorn, Dani Byrd, Uttam K. Sinha, Shrikanth Narayanan, Krishna S. Nayak:
State-of-the-art speech production MRI protocol for new 0.55 Tesla scanners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiZMZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiZMZ24
Mingyue Shi, Huali Zhou, Qinglin Meng, Nengheng Zheng:
DBD-CI: Doubling the Band Density for Bilateral Cochlear Implants.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhongXY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhongXY24
Huihang Zhong, Yanlu Xie, ZiJin Yao:
Leveraging Large Language Models to Refine Automatic Feedback Generation at Articulatory Level in Computer Aided Pronunciation Training.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoHMXLX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoHMXLX24
Bin Zhao, Mingxuan Huang, Chenlu Ma, Jinyi Xue, Aijun Li, Kunyu Xu:
Decoding Human Language Acquisition: EEG Evidence for Predictive Probabilistic Statistics in Word Segmentation.

Articulation, Convergence and Perception

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GiroudLPD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GiroudLPD24
Jérémy Giroud, Jessica Lei, Kirsty Phillips, Matthew H. Davis:
Behavioral evidence for higher speech rate convergence following natural than artificial time altered speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShenLN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShenLN24
Qingye Shen, Leonardo Lancia, Noël Nguyen:
A novel experimental design for the study of listener-to-listener convergence in phoneme categorization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiFZ0G24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiFZ0G24
Hao Li, Yuan Fang, Xueliang Zhang, Fei Chen, Guanglai Gao:
Cross-Attention-Guided WaveNet for EEG-to-MEL Spectrogram Reconstruction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LoddoPS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LoddoPS24
Nicolò Loddo, Francisca Pessanha, Almila Akdag Salah:
What if HAL breathed? Enhancing Empathy in Human-AI Interactions with Breathing Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lundmark24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lundmark24
Malin Svensson Lundmark:
Magnitude and timing of acceleration peaks in stressed and unstressed syllables.

Speech Emotion Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AmiriparianPGS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AmiriparianPGS24
Shahin Amiriparian, Filip Packan, Maurice Gerczuk, Björn W. Schuller:
ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GutierrezHWNLCC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GutierrezHWNLCC24
Fabian Ritter Gutierrez, Kuan-Po Huang, Jeremy H. M. Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng:
Dataset-Distillation Generative Model for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaiXCX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaiXCX24
Jialong Mai, Xiaofen Xing, Weidong Chen, Xiangmin Xu:
DropFormer: A Dynamic Noise-Dropping Transformer for Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiuJP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiuJP24
Minxue Niu, Mimansa Jaiswal, Emily Mower Provost:
From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs.

Self-Supervised Models in Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimHLKLY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimHLKLY24
Ju-ho Kim, Hee-Soo Heo, Bong-Jin Lee, Youngki Kwon, Minjae Lee, Ha-Jin Yu:
Self-supervised speaker verification with relational mask prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiaraLD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiaraLD24
Victor Miara, Théo Lepage, Réda Dehak:
Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LimSKHKKY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LimSKHKKY24
Chan-yeong Lim, Hyun-seo Shin, Ju-ho Kim, Jungwoo Heo, Kyo-Won Koo, Seung-bin Kim, Ha-Jin Yu:
Improving Noise Robustness in Self-supervised Pre-trained Model for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FathanZ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FathanZ024
Abderrahim Fathan, Xiaolin Zhu, Jahangir Alam:
On the impact of several regularization techniques on label noise robustness of self-supervised speaker verification systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiMLM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiMLM24
Zhe Li, Man-Wai Mak, Hung-yi Lee, Helen Meng:
Parameter-efficient Fine-tuning of Speaker-Aware Dynamic Prompts for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoWSCZXZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoWSCZXZ24
Yiyang Zhao, Shuai Wang, Guangzhi Sun, Zehua Chen, Chao Zhang, Mingxing Xu, Thomas Fang Zheng:
Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models.

Speech Quality Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuYT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuYT24
Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda:
Embedding Learning for Preference-based Speech Quality Assessment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UdupaMG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UdupaMG24
Sathvik Udupa, Soumi Maiti, Prasanta Kumar Ghosh:
IndicMOS: Multilingual MOS Prediction for 7 Indian languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WellsBVCPYR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WellsBVCPYR24
Dan Wells, Andrea Lorena Aldana Blanco, Cassia Valentini-Botinhao, Erica Cooper, Aidan Pine, Junichi Yamagishi, Korin Richmond:
Experimental evaluation of MOS, AB and BWS listening test designs.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaLDB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaLDB24
Bao Thang Ta, Minh Tu Le, Van Hai Do, Huynh Thi Thanh Binh:
Enhancing No-Reference Speech Quality Assessment with Pairwise, Triplet Ranking Losses, and ASR Pretraining.

Privacy and Security in Speech Communication 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MullerETSB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MullerETSB24
Nicolas M. Müller, Nicholas W. D. Evans, Hemlata Tak, Philip Sperl, Konstantin Böttinger:
Harder or Different? Understanding Generalization of Audio Deepfake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OisoMKM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OisoMKM24
Hideyuki Oiso, Yuto Matsunaga, Kazuya Kakizaki, Taiki Miyagawa:
Prompt Tuning for Audio Deepfake Detection: Computationally Efficient Test-time Domain Adaptation with Limited Target Dataset.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LooneyG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LooneyG24
David Looney, Nikolay D. Gaubitch:
Robust spread spectrum speech watermarking using linear prediction and deep spectral shaping.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenYXW0DZ0LF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenYXW0DZ0LF24
Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Zhao Lv, Cunhang Fan:
RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuVQRH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuVQRH24
Ailin Liu, Pepijn Vunderink, Jose Vargas Quiros, Chirag Raman, Hayley Hung:
How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangUWSL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangUWSL24
Ching-Yu Yang, Shreya G. Upadhyay, Ya-Tse Wu, Bo-Hao Su, Chi-Chun Lee:
RW-VoiceShield: Raw Waveform-based Adversarial Attack on One-shot Voice Conversion.

Speech Synthesis: Voice Conversion 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GusevA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GusevA24
Aleksei Gusev, Anastasia Avdeeva:
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UmK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UmK24
Ji Sub Um, Hoirin Kim:
Utilizing Adaptive Global Response Normalization and Cluster-Based Pseudo Labels for Zero-Shot Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaZLWWHZX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaZLWWHZX24
Linhan Ma, Xinfa Zhu, Yuanjun Lv, Zhichao Wang, Ziqian Wang, Wendi He, Hongbin Zhou, Lei Xie:
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IgarashiSSTYTS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IgarashiSSTYTS24
Takuto Igarashi, Yuki Saito, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, Hiroshi Saruwatari:
Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanagawaMI24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanagawaMI24
Hiroki Kanagawa, Takafumi Moriya, Yusuke Ijima:
Pre-training Neural Transducer-based Streaming Voice Conversion for Faster Convergence and Alignment-free Training.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuYWRZW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuYWRZW024
Le Xu, Jiangyan Yi, Tao Wang, Yong Ren, Rongxiu Zhong, Zhengqi Wen, Jianhua Tao:
Residual Speaker Representation for One-Shot Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GengembreBG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GengembreBG24
Nicolas Gengembre, Olivier Le Blouch, Cédric Gendrot:
Disentangling prosody and timbre embeddings via voice conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenG0LCC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenG0LCC024
Shihao Chen, Yu Gu, Jie Zhang, Na Li, Rilin Chen, Liping Chen, Lirong Dai:
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance.

Speech Synthesis: Text Processing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RothTA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RothTA24
Amit Roth, Arnon Turetzky, Yossi Adi:
A Language Modeling Approach to Diacritic-Free Hebrew TTS.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DekelF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DekelF24
Avihu Dekel, Raul Fernandez:
Exploring the Benefits of Tokenization of Discrete Acoustic Units.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RezackovaTM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RezackovaTM24
Markéta Rezácková, Daniel Tihelka, Jindrich Matousek:
Homograph Disambiguation with Text-to-Text Transfer Transformer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuriharaS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuriharaS24
Kiyoshi Kurihara, Masanori Sano:
Enhancing Japanese Text-to-Speech Accuracy with a Novel Combination Transformer-BERT-based G2P: Integrating Pronunciation Dictionaries and Accent Sandhi.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShirahataPYT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShirahataPYT24
Yuma Shirahata, Byeongseon Park, Ryuichi Yamamoto, Kentaro Tachibana:
Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/000524
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/000524
Xingxing Yang:
G2PA: G2P with Aligned Audio for Mandarin Chinese.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunR24
Siqi Sun, Korin Richmond:
Learning Pronunciation from Other Accents via Pronunciation Knowledge Transfer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuptaL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuptaL24
Deepanshu Gupta, Javier Latorre:
Positional Description for Numerical Normalization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TannanderMBE24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TannanderMBE24
Christina Tånnander, Shivam Mehta, Jonas Beskow, Jens Edlund:
Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis.

Training Methods, Self-Supervised Learning, Adaptation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fernandez-Lopez24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fernandez-Lopez24
Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Lu Yin, Qiao Xiao, Stavros Petridis, Shiwei Liu, Maja Pantic:
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PrasadMKMS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrasadMKMS24
Amrutha Prasad, Srikanth R. Madikeri, Driss Khalil, Petr Motlícek, Christof Schüpbach:
Speech and Language Recognition with Low-rank Adaptation of Pretrained Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimSHSL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimSHSL024
Kwangyoun Kim, Suwon Shon, Yi-Te Hsu, Prashant Sridhar, Karen Livescu, Shinji Watanabe:
Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeghananiH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeghananiH24
Amit Meghanani, Thomas Hain:
LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FlynnR24a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FlynnR24a
Robert Flynn, Anton Ragni:
Self-Train Before You Transcribe.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Eeckth24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Eeckth24
Steven Vander Eeckt, Hugo Van hamme:
Unsupervised Online Continual Learning for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiK24
Hao Shi, Tatsuya Kawahara:
Dual-path Adaptation of Pretrained Feature Extraction Module for Robust Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KusunokiHOK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KusunokiHOK24
Nahomi Kusunoki, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi:
Hierarchical Multi-Task Learning with CTC and Recursive Operation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HojoWOOK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HojoWOOK24
Keigo Hojo, Yukoh Wakabayashi, Kengo Ohta, Atsunori Ogawa, Norihide Kitaoka:
Boosting CTC-based ASR using inter-layer attention-based CTC loss.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimKLPCJPLYL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimKLPCJPLYL024
Hyung Yong Kim, Byeong-Yeol Kim, Yunkyu Lim, Jihwan Park, Shukjae Choi, Yooncheol Ju, Jinseok Park, Youshin Lim, Seung Woo Yu, Hanbin Lee, Shinji Watanabe:
Self-training ASR Guided by Unsupervised ASR Teacher.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuDZ0H24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuDZ0H24
Yue Gu, Zhihao Du, Shiliang Zhang, Jiqing Han, Yongjun He:
Personality-memory Gated Adaptation: An Efficient Speaker Adaptation for Personalized End-to-end Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JosephB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JosephB24
George Joseph, Arun Baby:
Speaker Personalization for Automatic Speech Recognition using Weight-Decomposed Low-Rank Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeLKKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeLKKC24
Jae-Hong Lee, Sang-Eon Lee, Dong-Hyun Kim, Do-Hee Kim, Joon-Hyuk Chang:
Online Subloop Search via Uncertainty Quantization for Efficient Test-Time Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghMHSK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghMHSK24
Vishwanath Pratap Singh, Federico Malato, Ville Hautamäki, Md. Sahidullah, Tomi Kinnunen:
ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2vec2.0 Based ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeS24
Jeehye Lee, Hyeji Seo:
Online Knowledge Distillation of Decoder-Only Large Language Models for Efficient Speech Recognition.

Novel Architectures for ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HondaSK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HondaSK24
Tomoki Honda, Shinsuke Sakai, Tatsuya Kawahara:
Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KashiwagiFTA024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KashiwagiFTA024
Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe:
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shejwalkar0N24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shejwalkar0N24
Virat Shejwalkar, Om Thakkar, Arun Narayanan:
Quantifying Unintended Memorization in BEST-RQ ASR Encoders.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangVBVPH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangVBVPH24
Woo Hyun Kang, Srikanth Vishnubhotla, Rudolf Braun, Yogesh Virkar, Raghuveer Peri, Kyu J. Han:
SWAN: SubWord Alignment Network for HMM-free word timing estimation in end-to-end automatic speech recognition.

Multimodality and Foundation Models

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CuiLWDQWCZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CuiLWDQWCZ24
Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang:
Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SayeedA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SayeedA24
Mohammad Amaan Sayeed, Hanan Aldarmaki:
Spoken Word2Vec: Learning Skipgram Embeddings from Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BujnowskiKPRMBA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BujnowskiKPRMBA24
Pawel Bujnowski, Bartlomiej Kuzma, Bartlomiej Paziewski, Jacek Rutkowski, Joanna Marhula, Zuzanna Bordzicka, Piotr Andruszkiewicz:
SAMSEMO: New dataset for multilingual and multimodal emotion recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiaCSZZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiaCSZZ24
Bonian Jia, Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang:
LLM-Driven Multimodal Opinion Expression Identification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLCLGW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLCLGW24
Xiaolou Li, Zehua Liu, Chen Chen, Lantian Li, Li Guo, Dong Wang:
Zero-Shot Fake Video Detection by Audio-Visual Consistency.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EunGiHSENJO24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EunGiHSENJO24
Han EunGi, Oh Hyun-Bin, Kim Sung-Bin, Corentin Nivelet Etcheberry, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh:
Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert.

Spoken Dialogue Systems and Conversational Analysis 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/McNeillL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/McNeillL24
Matthew McNeill, Rivka Levitan:
Autoregressive cross-interlocutor attention scores meaningfully capture conversational dynamics.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AtkinsWK0BK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AtkinsWK0BK24
Conor Atkins, Ian D. Wood, Mohamed Ali Kâafar, Hassan Asghar, Nardine Basta, Michal Kepkowski:
ConvoCache: Smart Re-Use of Chatbot Responses.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QianS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QianS24
Livia Qian, Gabriel Skantze:
Joint Learning of Context and Feedback Embeddings in Spoken Dialogue.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SahipjohnGSWS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SahipjohnGSWS24
Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Rajiv Ratn Shah:
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangSG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangSG24
Siyang Wang, Éva Székely, Joakim Gustafson:
Contextual Interactive Evaluation of TTS Models in Dialogue Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShihCPHL0L24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShihCPHL0L24
Min-Han Shih, Ho-Lam Chung, Yu-Chi Pai, Ming-Hao Hsu, Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee:
GSQA: An End-to-End Model for Generative Spoken Question Answering.

Speech Technology

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NilssonMLPZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NilssonMLPZ24
Mattias Nilsson, Riccardo Miccini, Clement Laroche, Tobias Piechowiak, Friedemann Zenke:
Resource-Efficient Speech Quality Prediction through Quantization Aware Training and Binary Activation Maps.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NaderiHNHM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NaderiHNHM24
Maryam Naderi, Enno Hermann, Alexandre Nanchen, Sevada Hovsepyan, Mathew Magimai-Doss:
Towards interfacing large language models with ASR systems using confidence measures and prompting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengWPPWCSR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengWPPWCSR24
Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran:
Text Injection for Neural Contextual Biasing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuXWM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuXWM24
Minglin Wu, Jing Xu, Xixin Wu, Helen Meng:
Prompting Large Language Models with Mispronunciation Detection and Diagnosis Abilities.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunCMS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunCMS24
Haitong Sun, Jaehyun Choi, Nobuaki Minematsu, Daisuke Saito:
Acceleration of Posteriorgram-based DTW by Distilling the Class-to-class Distances Encoded in the Classifier Used to Calculate Posteriors.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GudmalwarSAWS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GudmalwarSAWS24
Ashishkumar Gudmalwar, Nirmesh Shah, Sai Akarsh, Pankaj Wasnik, Rajiv Ratn Shah:
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Svirsky0L24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Svirsky0L24
Jonathan Svirsky, Uri Shaham, Ofir Lindenbaum:
Sparse Binarization for Fast Keyword Spotting.

Pathological Speech Analysis 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HalpernTHVRVW0A24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HalpernTHVRVW0A24
Bence Mark Halpern, Thomas Tienkamp, Wen-Chin Huang, Lester Phillip Violeta, Teja Rebernik, Sebastiaan A. H. J. de Visscher, Max J. H. Witjes, Martijn Wieling, Defne Abur, Tomoki Toda:
Quantifying the effect of speech pathology on automatic and human speaker verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MajiGRNM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MajiGRNM24
Bubai Maji, Rajlakshmi Guha, Aurobinda Routray, Shazia Nasreen, Debabrata Majumdar:
Investigation of Layer-Wise Speech Representations in Self-Supervised Learning Models: A Cross-Lingual Study in Detecting Depression.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TalkarCKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TalkarCKK24
Tanya Talkar, Sherman Charles, Chelsea Krantsevich, Kan Kawabata:
Detection of Cognitive Impairment And Alzheimer's Disease Using a Speech- and Language-Based Protocol.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinZLBS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinZLBS24
Nana Lin, Youxiang Zhu, Xiaohui Liang, John A. Batsis, Caroline Summerour:
Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WoszczykAD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WoszczykAD24
Dominika Woszczyk, Ranya Aloufi, Soteris Demetriou:
Prosody-Driven Privacy-Preserving Dementia Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoudounasCFCSCB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoudounasCFCSCB24
Alkis Koudounas, Gabriele Ciravegna, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli, Elena Baralis:
Voice Disorder Analysis: a Transformer-based Approach.

Speech Science, Speech Technology, and Gender (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Schubert0S24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Schubert0S24
Martha Schubert, Daniel Duran, Ingo Siegert:
Challenges of German Speech Recognition: A Study on Multi-ethnolectal Speech Among Adolescents.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SigurgeirssonU24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SigurgeirssonU24
Atli Sigurgeirsson, Eddie L. Ungless:
Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PelloinDCHD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PelloinDCHD24
Valentin Pelloin, Lena Dodson, Émile Chapuis, Nicolas Hervé, David Doukhan:
Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoukhanDCPCLHMC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoukhanDCPCLHMC24
David Doukhan, Lena Dodson, Manon Conan, Valentin Pelloin, Aurélien Clamouse, Mélina Lepape, Géraldine Van Hille, Cécile Méadel, Marlène Coulomb-Gully:
Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HughesBMD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HughesBMD24
Cliodhna Hughes, Guy J. Brown, Ning Ma, Nicola Dibben:
Acoustic Effects of Facial Feminisation Surgery on Speech and Singing: A Case Study.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SzekelyH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzekelyH24
Éva Székely, Maxwell Hope:
An inclusive approach to creating a palette of synthetic voices for gender diversity.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NetzorgCKGA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NetzorgCKGA24
Robin Netzorg, Alyssa Cote, Sumi Koshin, Klo Vivienne Garoute, Gopala Krishna Anumanchipalli:
Speech After Gender: A Trans-Feminine Perspective on Next Steps for Speech Science and Technology.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LaiH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LaiH24
Li-Fang Lai, Nicole R. Holliday:
Voice Quality Variation in AAE: An Additional Challenge for Addressing Bias in ASR Models?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElieDUYRD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElieDUYRD24
Benjamin Elie, David Doukhan, Rémi Uro, Lucas Ondel Yang, Albert Rilliard, Simon Devauchelle:
Articulatory Configurations across Genders and Periods in French Radio and TV archives.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KrishnanAK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KrishnanAK24
Aravind Krishnan, Badr M. Abdullah, Dietrich Klakow:
On the Encoding of Gender in Transformer-based ASR Representations.

Speech Production and Perception

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanHMW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanHMW24
Chaofei Fan, Jaimie M. Henderson, Chris Manning, Francis R. Willett:
Towards a Quantitative Analysis of Coarticulation with a Phoneme-to-Articulatory Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SharmaCG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SharmaCG24
Chetan Sharma, Vaishnavi Chandwanshi, Prasanta Kumar Ghosh:
A comparative study of the impact of voiceless alveolar and palato-alveolar sibilants in English on lip aperture and protrusion during VCV production.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BirkholzH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BirkholzH24
Peter Birkholz, Patrick Häsner:
Measurement and simulation of pressure losses due to airflow in vocal tract models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fang24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fang24
Qiang Fang:
On The Performance of EMA-synchronized Speech and Stand-alone Speech in Acoustic-to-articulatory Inversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FreixesASJGP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FreixesASJGP24
Marc Freixes, Marc Arnela, Joan Claudi Socoró, Luis Joglar-Ongay, Oriol Guasch, Francesc Alías Pujol:
Glottal inverse filtering and vocal tract tuning for the numerical simulation of vowel /a/ with different levels of vocal effort.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FriedrichsLK0CL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FriedrichsLK0CL24
Daniel Friedrichs, Monica Lancheros, Sam Kirkham, Lei He, Andrew Clark, Clemens Lutz, Volker Dellwo, Steven Moran:
Temporal Co-Registration of Simultaneous Electromagnetic Articulography and Electroencephalography for Precise Articulatory and Neural Data Alignment.

Phonetics and Phonology: Segmentals and Suprasegmentals

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiodonskaKKSB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiodonskaKKSB24
Zuzanna Miodonska, Michal Krecichwost, Ewa Kwasniok, Agata Sage, Pawel Badura:
Frication noise features of Polish voiceless dental fricative and affricate produced by children with and without speech disorder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuF24
Yiying Hu, Hui Feng:
Key Acoustic Cues for the Realization of Metrical Prominence in Tone Languages: A Cross-Dialect Study.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WatkinsBH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WatkinsBH24
Michaela Watkins, Paul Boersma, Silke Hamann:
Revisiting Pitch Jumps: F0 Ratio in Seoul Korean.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaselliD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaselliD24
Lorenzo Maselli, Véronique Delvaux:
Aerodynamics of Sakata labial-velar oral stops.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EricksonRLSCNM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EricksonRLSCNM24
Donna Erickson, Albert Rilliard, Malin Svensson Lundmark, Adelaide Silva, Leticia Rebollo Couto, Oliver Niebuhr, João Antônio de Moraes:
Collecting Mandible Movement in Brazilian Portuguese.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChanK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChanK24
May Pik Yu Chan, Jianjing Kuang:
Pitch-driven adjustments in tongue positions: Insights from ultrasound imaging.

Topics in Paralinguistics

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BNRA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BNRA24
Suhas BN, Amanda Rebar, Saeed Abdullah:
Speaking of Health: Leveraging Large Language Models to assess Exercise Motivation and Behavior of Rehabilitation Patients.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/00070W24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/00070W24
Wen Wu, Chao Zhang, Philip C. Woodland:
Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SudaWT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SudaWT24
Hitoshi Suda, Aya Watanabe, Shinnosuke Takamichi:
Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SetoguchiA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SetoguchiA24
Ryo Setoguchi, Yoshiko Arimoto:
Acoustical analysis of the initial phones in speech-laugh.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaoHGZF24a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaoHGZF24a
Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng:
On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuM24
Rui Liu, Zening Ma:
Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge.

Emotion Recognition: Fairness, Variability, Uncertainty

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002DSA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002DSA24
Jingyao Wu, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah:
Dual-Constrained Dynamical Neural ODEs for Ambiguity-aware Continuous Emotion Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChouCWL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChouCWL24
Hsing-Hang Chou, Woan-Shiuan Chien, Ya-Tse Wu, Chi-Chun Lee:
An Inter-Speaker Fairness-Aware Speech Emotion Regression Framework.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TavernorEP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TavernorEP24
James Tavernor, Yara El-Tawil, Emily Mower Provost:
The Whole Is Bigger Than the Sum of Its Parts: Modeling Individual Annotators to Capture Emotional Variability.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunZKWWZQ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunZKWWZQ24
Haoqin Sun, Shiwan Zhao, Xiangyu Kong, Xuechen Wang, Hui Wang, Jiaming Zhou, Yong Qin:
Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChienL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChienL24
Woan-Shiuan Chien, Chi-Chun Lee:
An Investigation of Group versus Individual Fairness in Perceptually Fair Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchruferMBES24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchruferMBES24
Oliver Schrüfer, Manuel Milling, Felix Burkhardt, Florian Eyben, Björn W. Schuller:
Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GarciaMGLBBY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GarciaMGLBBY24
Ricardo García, Rodrigo Mahú, Nicolás Grágeda, Alejandro Luzanto, Nicolas Bohmer, Carlos Busso, Néstor Becerra Yoma:
Speech emotion recognition with deep learning beamforming on a distant human-robot interaction scenario.

Speaker Verification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StafylakisSRPB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StafylakisSRPB24
Themos Stafylakis, Anna Silnova, Johan Rohdin, Oldrich Plchot, Lukás Burget:
Challenging margin-based speaker embedding extractors by using the variational information bottleneck.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChienYM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChienYM24
Jen-Tzung Chien, I-Ping Yeh, Man-Wai Mak:
Collaborative Contrastive Learning for Hypothesis Domain Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AmorBM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AmorBM24
Imen Ben Amor, Jean-François Bonastre, Salima Mdhaffar:
Extraction of interpretable and shared speaker-specific speech attributes through binary auto-encoder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YakovlevMBMOT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YakovlevMBMOT24
Ivan Yakovlev, Rostislav Makarov, Andrei Balykin, Pavel Malov, Anton Okhotnikov, Nikita Torgashov:
Reshape Dimensions Network for Speaker Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Jung0E0STAYC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Jung0E0STAYC24
Jee-weon Jung, Xin Wang, Nicholas W. D. Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Siddhant Arora, Junichi Yamagishi, Joon Son Chung:
To what extent can ASV systems naturally defend against spoofing attacks?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenZ0CCZL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenZ0CCZL24
Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Junjie Li:
ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency.

Spatial Audio and Acoustics

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KhokhlovPMDATRK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KhokhlovPMDATRK24
Yuri Y. Khokhlov, Tatiana Prisyach, Anton Mitrofanov, Dmitry Dutov, Igor Agafonov, Tatiana Timofeeva, Aleksei Romanenko, Maxim Korenevsky:
Classification of Room Impulse Responses and its application for channel verification and diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KelleyCN0BY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KelleyCN0BY24
Liam Kelley, Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii:
RIR-in-a-Box: Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnYHSRSTC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnYHSRSTC24
Byeongjoo Ahn, Karren D. Yang, Brian Hamilton, Jonathan Sheaffer, Anurag Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang:
Novel-view Acoustic Synthesis From 3D Reconstructed Rooms.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaoJHB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaoJHB24
Liang Tao, Maoshen Jia, Yonggang Hu, Changchun Bao:
Spatial Acoustic Enhancement Using Unbiased Relative Harmonic Coefficients.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BayestehtashkKW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BayestehtashkKW24
Alireza Bayestehtashk, Amit Kumar, Mike Wurtz:
Design of Feedback Active Noise Cancellation Filter Using Nested Recurrent Neural Networks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YargaW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YargaW24
Sidi Yaya Arnaud Yarga, Sean U. N. Wood:
Neuromorphic Keyword Spotting with Pulse Density Modulation MEMS Microphones.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BittermanLDGR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BittermanLDGR24
Jacob Bitterman, Daniel Levi, Hilel Hagai Diamandi, Sharon Gannot, Tal Rosenwein:
RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification.

Generative Models for Speech and Audio

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Bai0TKS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Bai0TKS24
Yatong Bai, Trung Dang, Dung N. Tran, Kazuhito Koishida, Somayeh Sojoudi:
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PaissanLWSRS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PaissanLWSRS24
Francesco Paissan, Luca Della Libera, Zhepei Wang, Paris Smaragdis, Mirco Ravanelli, Cem Subakan:
Audio Editing with Non-Rigid Text Prompts.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuptaRGS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuptaRGS24
Shubham Gupta, Mirco Ravanelli, Pascal Germain, Cem Subakan:
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoschopoulosKPN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoschopoulosKPN24
Vasileios Moschopoulos, Thanasis Kotsiopoulos, Pablo Peso Parada, Konstantinos Nikiforidis, Alexandros Stergiadis, Gerasimos Papakostas, Md Asif Jalal, Jisi Zhang, Anastasios Drosou, Karthikeyan Saravanan:
Exploring compressibility of transformer based text-to-music (TTM) models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimCAC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimCAC24
Jaewon Kim, Won-Gook Choi, Seyun Ahn, Joon-Hyuk Chang:
Sound of Vision: Audio Generation from Visual Text Embedding through Training Domain Discriminator.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiCC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiCC24
Ho-Young Choi, Won-Gook Choi, Joon-Hyuk Chang:
Retrieval-Augmented Classifier Guidance for Audio Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CappellazzoFB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CappellazzoFB24
Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti:
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DeshmukhAEGISRW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DeshmukhAEGISRW24
Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang:
PAM: Prompting Audio-Language Models for Audio Quality Assessment.

Speech and Audio Modelling

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoZDF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoZDF24
Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng:
GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuillaumeFMW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuillaumeFMW24
Séverine Guillaume, Maxime Fily, Alexis Michaud, Guillaume Wisniewski:
Gender and Language Identification in Multilingual Models of Speech: Exploring the Genericity and Robustness of Speech Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLCSP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLCSP24
Zhaoyu Wang, Haohe Liu, Harry Coppock, Björn W. Schuller, Mark D. Plumbley:
Neural Compression Augmentation for Contrastive Audio Representation Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AluruMY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AluruMY24
Sai Harshitha Aluru, Jhansi Mallela, Chiranjeevi Yarra:
Post-Net: A linguistically inspired sequence-dependent transformed neural architecture for automatic syllable stress detection.

Multi-Channel Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TammenODNAD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TammenODNAD24
Marvin Tammen, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, Simon Doclo:
Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuA0PLXN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuA0PLXN24
Zhongweiyang Xu, Ali Aroudi, Ke Tan, Ashutosh Pandey, Jung-Suk Lee, Buye Xu, Francesco Nesta:
FoVNet: Configurable Field-of-View Speech Enhancement with Low Computation and Distortion for Smart Glasses.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AzizAP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AzizAP24
Shiran Aziz, Yossi Adi, Shmuel Peleg:
Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeC24
Dongheon Lee, Jung-Woo Choi:
DeFTAN-AA: Array Geometry Agnostic Multichannel Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouJTQ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouJTQ24
Nan Zhou, Youhai Jiang, Jialin Tan, Chongmin Qi:
PLDNet: PLD-Guided Lightweight Deep Network Boosted by Efficient Attention for Handheld Dual-Microphone Speech Enhancement.

Speech Synthesis: Paradigms and Methods 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/McGheeKG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/McGheeKG24
Charles McGhee, Kate M. Knill, Mark J. F. Gales:
Highly Intelligible Speaker-Independent Articulatory Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MurataMK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MurataMK24
Masato Murata, Koichi Miyazaki, Tomoki Koriyama:
An Attribute Interpolation Method in Speech Synthesis by Model Merging.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NishiharaWRP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NishiharaWRP24
Miku Nishihara, Dan Wells, Korin Richmond, Aidan Pine:
Low-dimensional Style Token Control for Hyperarticulated Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXGZLXCYL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXGZLXCYL24
Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li:
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002ATK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002ATK24
Trung Dang, David Aponte, Dung N. Tran, Kazuhito Koishida:
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kim24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kim24
Changhwan Kim:
ClariTTS: Feature-ratio Normalization and Duration Stabilization for Code-mixed Multi-speaker Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JaniczekCDFWWL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JaniczekCDFWWL24
John Janiczek, Dading Chong, Dongyang Dai, Arlo Faria, Chao Wang, Tao Wang, Yuzong Liu:
Multi-modal Adversarial Training for Zero-Shot Voice Cloning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChienTV0SH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChienTV0SH24
Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu:
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuMF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuMF24
Rongshuai Wu, Debasish Ray Mohapatra, Sidney Fels:
Modeling Vocal Tract Like Acoustic Tubes Using the Immersed Boundary Method.

Speech Synthesis: Paradigms and Methods 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LemerleOR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LemerleOR24
Théodor Lemerle, Nicolas Obin, Axel Roebel:
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NeekharaHGLG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NeekharaHGLG24
Paarth Neekhara, Shehzeen Hussain, Subhankar Ghosh, Jason Li, Boris Ginsburg:
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LaiH0WHY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LaiH0WHY24
Shijie Lai, Minglu He, Zijing Zhao, Kai Wang, Hao Huang, Jichen Yang:
Synthesizing Long-Form Speech merely from Sentence-Level Corpus with Content Extrapolation and LLM Contextual Enrichment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0008X0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0008X0024
Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li:
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0003ZMZWNN0Y024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0003ZMZWNN0Y024
Kun Zhou, Shengkui Zhao, Yukun Ma, Chong Zhang, Hao Wang, Dianwen Ng, Chongjia Ni, Trung Hieu Nguyen, Jia Qi Yip, Bin Ma:
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeJKLCK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeJKLCK24
Joun Yeop Lee, Myeonghun Jeong, Minchan Kim, Ji-Hyun Lee, Hoon-Young Cho, Nam Soo Kim:
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LengletPB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LengletPB24
Martin Lenglet, Olivier Perrotin, Gérard Bailly:
FastLips: an End-to-End Audiovisual Text-to-Speech System with Lip Features Prediction for Virtual Avatars.

Neural Network Architectures for ASR 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangRR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangRR24
Tien-Ju Yang, Andrew Rosenberg, Bhuvana Ramabhadran:
Contemplative Mechanism for Speech Recognition: Speech Encoders can Think.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParcolletDZB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParcolletDZB24
Titouan Parcollet, Rogier van Dalen, Shucong Zhang, Sourav Bhattacharya:
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoriyaAMSMMA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoriyaAMSMMA24
Takafumi Moriya, Takanori Ashihara, Masato Mimura, Hiroshi Sato, Kohei Matsuura, Ryo Masumura, Taichi Asami:
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KunduNPN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KunduNPN24
Arnav Kundu, Prateeth Nayak, Priyanka Padmanabhan, Devang Naik:
RepCNN: Micro-sized, Mighty Models for Wakeword Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Keirsbilck024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Keirsbilck024
Matthijs Van Keirsbilck, Alexander Keller:
Conformer without Convolutions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangPDB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangPDB24
Shucong Zhang, Titouan Parcollet, Rogier van Dalen, Sourav Bhattacharya:
Linear-Complexity Self-Supervised Learning for Speech Processing.

Error Correction and Rescoring

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MittalPSJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MittalPSJ24
Ashish R. Mittal, Darshan Prabhu, Sunita Sarawagi, Preethi Jyothi:
SALSA: Speedy ASR-LLM Synchronous Aggregation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonYHHY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonYHHY24
Eunseop Yoon, Hee Suk Yoon, John B. Harvill, Mark Hasegawa-Johnson, Chang D. Yoo:
LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangL024
Yiwei Wang, Ke-Han Lu, Kuan-Yu Chen:
HypR: A comprehensive study for ASR hypothesis revising with a reference corpus.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShuHHSW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShuHHSW024
Yuchun Shu, Bo Hu, Yifeng He, Hao Shi, Longbiao Wang, Jianwu Dang:
Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangGS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangGS24
Iwen E. Kang, Christophe Van Gysel, Man-Hung Siu:
Transformer-based Model for ASR N-Best Rescoring and Rewriting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangZWG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangZWG24
Hao Yang, Min Zhang, Minghan Wang, Jiaxin Guo:
RASU: Retrieval Augmented Speech Understanding through Generative Modeling.

Spoken Language Understanding

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AimaitiWJAHS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AimaitiWJAHS24
Ainikaerjiang Aimaiti, Di Wu, Liting Jiang, Gulinigeer Abudouwaili, Hao Huang, Wushour Silamu:
An Uyghur Extension to the MASSIVE Multi-lingual Spoken Language Understanding Corpus with Comprehensive Evaluations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChristAHSBG0S24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChristAHSBG0S24
Lukas Christ, Shahin Amiriparian, Friederike Hawighorst, Ann-Kathrin Schill, Angelo Boutalikakis, Lorenz Graf-Vlachy, Andreas König, Björn W. Schuller:
This Paper Had the Smartest Reviewers - Flattery Detection Utilising an Audio-Textual Transformer-Based Approach.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AkaniBFG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AkaniBFG24
Eunice Akani, Frédéric Béchet, Benoît Favre, Romain Gemignani:
Unified Framework for Spoken Language Understanding and Summarization in Task-Based Human Dialog processing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AndersonHGB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AndersonHGB24
Grant Anderson, Emma Hart, Dimitra Gkatzia, Ian Beaver:
Automated Human-Readable Label Generation in Open Intent Discovery.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangC024
Jeremy Chang, Kuan-Yu Chen, Chung-Hsien Wu:
Applying Reinforcement Learning and Multi-Generators for Stage Transition in an Emotional Support Dialogue System.

Spoken Dialogue Systems and Conversational Analysis 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWWIEYG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWWIEYG24
Tuochao Chen, Qirui Wang, Bohan Wu, Malek Itani, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota:
Target conversation extraction: Source separation using turn-taking dynamics.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NgLOW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NgLOW24
Sara Ng, Gina-Anne Levow, Mari Ostendorf, Richard A. Wright:
Investigating the Influence of Stance-Taking on Conversational Timing of Task-Oriented Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UroTDLR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UroTDLR24
Rémi Uro, Marie Tahon, David Doukhan, Antoine Laurent, Albert Rilliard:
Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WatanabeIM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WatanabeIM24
Yu Watanabe, Koichiro Ito, Shigeki Matsubara:
Utilization of Text Data for Response Timing Detection in Attentive Listening.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkLCKBYL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkLCKBYL24
Yo-Han Park, Wencke Liermann, Yong-Seok Choi, Seung Hi Kim, Jeong-Uk Bang, Seung Yun, Kong Joo Lee:
Backchannel prediction, based on who, when and what.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HutinHD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HutinHD24
Mathilde Hutin, Junfei Hu, Liesbeth Degand:
Uh, um and mh: Are filled pauses prone to conversational converge?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhagiMY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhagiMY24
Masaya Ohagi, Tomoya Mizumoto, Katsumasa Yoshikawa:
Investigation of look-ahead techniques to improve response time in spoken dialogue system.

Computational Models of Human Language Acquisition, Perception, and Production (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeuserK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeuserK24
Annika Heuser, Jianjing Kuang:
Information-theoretic hypothesis generation of relative cue weighting for the voicing contrast.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HovsepyanM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HovsepyanM24
Sevada Hovsepyan, Mathew Magimai-Doss:
Neurocomputational model of speech recognition for pathological speech detection: a case study on Parkinson's disease speech detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TandazoSHD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TandazoSHD24
Angelo Ortiz Tandazo, Thomas Schatz, Thomas Hueber, Emmanuel Dupoux:
Simulating articulatory trajectories with phonological feature interpolation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OndaPMS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OndaPMS24
Kentaro Onda, Joonyong Park, Nobuaki Minematsu, Daisuke Saito:
A Pilot Study of GSLM-based Simulation of Foreign Accentuation Only Using Native Speech Corpora.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BonafosBPFRTR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BonafosBPFRTR24
Guillem Bonafos, Clara Bourot, Pierre Pudlo, Jean-Marc Freyermuth, Laurence Reboul, Samuel Tronçon, Arnaud Rey:
Dirichlet process mixture model based on topologically augmented signal representation for clustering infant vocalizations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElieST24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElieST24
Benjamin Elie, Juraj Simko, Alice Turk:
A data-driven model of acoustic speech intelligibility for optimization-based models of speech production.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CoffeyRSC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CoffeyRSC24
Joseph Coffey, Okko Räsänen, Camila Scaff, Alejandrina Cristià:
The Difficulty and Importance of Estimating the Lower and Upper Bounds of Infant Speech Exposure.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiekerkZCK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiekerkZCK24
Benjamin van Niekerk, Julian Zaïdi, Marc-André Carbonneau, Herman Kamper:
Spoken-Term Discovery using Discrete Speech Units.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MohamedLTG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MohamedLTG24
Mukhtar Mohamed, Oli Danyi Liu, Hao Tang, Sharon Goldwater:
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations.

Show and Tell 3

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/RyuminaR024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RyuminaR024
Elena Ryumina, Dmitry Ryumin, Alexey Karpov:
OCEAN-AI: open multimodal framework for personality traits assessment and HR-processes automatization.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/MundraSCPB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MundraSCPB24
Paridhi Mundra, Manik Sharma, Yashwardhan Chaudhuri, Orchid Chetia Phukan, Arun Balaji Buduru:
VoxMed: one-step respiratory disease classifier using digital stethoscope sounds.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/SharmaPSB024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SharmaPSB024
Sarthak Sharma, Orchid Chetia Phukan, Drishti Singh, Arun Balaji Buduru, Rajesh Sharma:
AVR: synergizing foundation models for audio-visual humor detection.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ChaudhuriMBPB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaudhuriMBPB24
Yashwardhan Chaudhuri, Paridhi Mundra, Arnesh Batra, Orchid Chetia Phukan, Arun Balaji Buduru:
ASGIR: audio spectrogram transformer guided classification and information retrieval for birds.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/KoshalPJB024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoshalPJB024
Devyani Koshal, Orchid Chetia Phukan, Sarthak Jain, Arun Balaji Buduru, Rajesh Sharma:
PERSONA: an application for emotion recognition, gender recognition and age estimation.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/AkhtarGPS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AkhtarGPS24
Mohd Mujtaba Akhtar, Girish, Orchid Chetia Phukan, Muskaan Singh:
NeuRO: an application for code-switched autism detection in children.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/PhukanJSSB024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanJSSB024
Orchid Chetia Phukan, Sarthak Jain, Shubham Singh, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma:
ComFeAT: combination of neural and spectral features for improved depression detection.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/JainPB024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JainPB024
Sarthak Jain, Orchid Chetia Phukan, Arun Balaji Buduru, Rajesh Sharma:
The reasonable effectiveness of speaker embeddings for violence detection.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ObukhovKA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ObukhovKA24
Dmitrii Obukhov, Marcel de Korte, Andrey Adaschik:
ATTEST: an analytics tool for the testing and evaluation of speech technologies.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/MassonSGC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MassonSGC24
Margot Masson, Erfan A. Shams, Iona Gessinger, Julie Carson-Berndsen:
PhoneViz: exploring alignments at a glance.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/PagesB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PagesB24
Clément Pages, Hervé Bredin:
Gryannote open-source speaker diarization labeling tool.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/MorroneZBSB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MorroneZBSB24
Giovanni Morrone, Enrico Zovato, Fabio Brugnara, Enrico Sartori, Leonardo Badino:
A toolkit for joint speaker diarization and identification with application to speaker-attributed ASR.

Phonetics, Phonology and Prosody

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KinnunenHWY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KinnunenHWY24
Tomi H. Kinnunen, Rosa González Hautamäki, Xin Wang, Junichi Yamagishi:
Speaker Detection by the Individual Listener and the Crowd: Parametric Models Applicable to Bonafide and Deepfake Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LucaCD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LucaCD24
Alessandro De Luca, Andrew Clark, Volker Dellwo:
NumberLie: a game-based experiment to understand the acoustics of deception and truthfulness.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IaconoCR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IaconoCR24
Federico Lo Iacono, Valentina Colonna, Antonio Romano:
Preservation, conservation and phonetic study of the voices of Italian poets: A study on the seven years of the VIP archive.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AudibertFM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AudibertFM24
Nicolas Audibert, Cécile Fougeron, Christine Meunier:
Do Speaker-dependent Vowel Characteristics depend on Speech Style?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuBZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuBZ24
Suyuan Liu, Molly Babel, Jian Zhu:
A comparison of voice similarity through acoustics, human perception and deep neural network (DNN) speaker verification systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JonesR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JonesR24
Austin Jones, Margaret E. L. Renwick:
Evaluating Italian Vowel Variation with the Recurrent Neural Network Phonet.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TulchynskaJWZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TulchynskaJWZ24
Kira Tulchynska, Sylvanus Job, Alena Witzlack-Makarevich, Margaret Zellers:
Prosodic marking of syntactic boundaries in Khoekhoe.

Segmentals

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CronenbergCLV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CronenbergCLV24
Johanna Cronenberg, Ioana Chitoran, Lori Lamel, Ioana Vasilescu:
Crosslinguistic Comparison of Acoustic Variation in the Vowel Sequences /ia/ and /io/ in Four Romance Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RodriguezVSG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RodriguezVSG24
Jenifer Vega Rodríguez, Nathalie Vallée, Christophe Savariaux, Silvain Gerber:
Nasal Air Flow During Speech Production In Korebaju.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kye24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kye24
Ted Kye:
Affricates in Lushootseed.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TerhiijaS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TerhiijaS24
Viyazonuo Terhiija, Priyankoo Sarmah:
Voiced and voiceless laterals in Angami.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangR24
Minmin Yang, Rachid Ridouane:
Intrusive schwa within French stop-liquid clusters: An acoustic analysis.

New Avenues in Emotion Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu0SL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu0SL24
Ya-Tse Wu, Jingyao Wu, Vidhyasaharan Sethu, Chi-Chun Lee:
Can Modelling Inter-Rater Ambiguity Lead To Noise-Robust Continuous Emotion Predictions?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001GWS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001GWS24
Ziping Zhao, Tian Gao, Haishuai Wang, Björn W. Schuller:
MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shi0T24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shi0T24
Xiaohan Shi, Xingfeng Li, Tomoki Toda:
Multimodal Fusion of Music Theory-Inspired and Self-Supervised Representations for Improved Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Triantafyllopoulos24b
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Triantafyllopoulos24b
Andreas Triantafyllopoulos, Björn W. Schuller:
Enrolment-based personalisation for improving individual-level fairness in speech emotion recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeemFOGB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeemFOGB24
Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso:
Keep, Delete, or Substitute: Frame Selection Strategy for Noise-Robust Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0005ZZLQSZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0005ZZLQSZ24
Cheng Lu, Yuan Zong, Yan Zhao, Hailun Lian, Tianhua Qi, Björn W. Schuller, Wenming Zheng:
Hierarchical Distribution Adaptation for Unsupervised Cross-corpus Speech Emotion Recognition.

Speaker Diarization 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLPW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLPW24
Chenyuan Zhang, Linkai Luo, Hong Peng, Wei Wen:
Variable Segment Length and Domain-Adapted Feature Optimization for Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiJKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiJKC24
Jeong-Hwan Choi, Ye-Rin Jeoung, Ilseok Kim, Joon-Hyuk Chang:
Efficient Speaker Embedding Extraction Using a Twofold Sliding Window Algorithm for Speaker Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHZCXL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHZCXL24
Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao:
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BlattKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BlattKK24
Alexander Blatt, Aravind Krishnan, Dietrich Klakow:
Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PlaquetB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PlaquetB24
Alexis Plaquet, Hervé Bredin:
On the calibration of powerset speaker diarization models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaroudiPB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaroudiPB24
Séverin Baroudi, Thomas Pellegrini, Hervé Bredin:
Specializing Self-Supervised Speech Representations for Speaker Segmentation.

Speaker Recognition 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LoweimiQKG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LoweimiQKG24
Erfan Loweimi, Mengjie Qian, Kate M. Knill, Mark J. F. Gales:
On the Usefulness of Speaker Embeddings for Speaker Retrieval in the Wild: A Comparative Study of x-vector and ECAPA-TDNN Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JinTM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinTM24
Zezhong Jin, Youzhi Tu, Man-Wai Mak:
W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShenGHHZFZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShenGHHZFZ24
Yao Shen, Yingying Gao, Yaqian Hao, Chenguang Hu, Fulin Zhang, Junlan Feng, Shilei Zhang:
CEC: A Noisy Label Detection Method for Speaker Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZLGSZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZLGSZ24
Fengrun Zhang, Wangjin Zhou, Yiming Liu, Wang Geng, Yahui Shan, Chen Zhang:
Disentangling Age and Identity with a Mutual Information Minimization for Cross-Age Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGGPZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGGPZ24
Zuoliang Li, Wu Guo, Bin Gu, Shengyu Peng, Jie Zhang:
Contrastive Learning and Inter-Speaker Distribution Alignment Based Unsupervised Domain Adaptation for Robust Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0007D0D0RTBN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0007D0D0RTBN24
Minh Nguyen, Franck Dernoncourt, Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan A. Rossi, Quan Hung Tran, Trung Bui, Thien Huu Nguyen:
Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CRK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CRK24
Bhasi K. C., Rajeev Rajan, Noumida Abdul Kareem:
Attention-augmented X-vectors for the Evaluation of Mimicked Speech Using Sparse Autoencoder-LSTM framework.

Speech and Audio Analysis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlmudevarMGTVML24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlmudevarMGTVML24
Antonio Almudévar, Théo Mariotte, Alfonso Ortega Giménez, Marie Tahon, Luis Vicente, Antonio Miguel, Eduardo Lleida:
Predefined Prototypes for Intra-Class Separation and Disentanglement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Koriyama24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Koriyama24
Tomoki Koriyama:
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KaranVWN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KaranVWN24
Biswajit Karan, Joshua Jansen van Vüren, Febe de Wet, Thomas Niesler:
A Transformer-Based Voice Activity Detector.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DumpalaSSKFN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DumpalaSSKFN24
Sri Harsha Dumpala, Dushyant Sharma, Chandramouli Shama Sastry, Stanislav Yu. Kruchinin, James Fosburgh, Patrick A. Naylor:
XANE: eXplainable Acoustic Neural Embeddings.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MallelaAY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MallelaAY24
Jhansi Mallela, Sai Harshitha Aluru, Chiranjeevi Yarra:
A comparative analysis of sequential models that integrate syllable dependency for automatic syllable stress detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLYWX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLYWX24
Jiahao Li, Miao Liu, Shu Yang, Jing Wang, Xiang Xie:
Motion Based Audio-Visual Segmentation.

Speech Quality and Intelligibility: Prediction and Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BestCM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BestCM24
Paul Best, Santiago Cuervo, Ricard Marxer:
Transfer Learning from Whisper for Microscopic Intelligibility Prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZezarioCFWT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZezarioCFWT24
Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao:
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangECL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangECL024
Haolan Wang, Amin Edraki, Wai-Yip Chan, Iván López-Espejo, Jesper Jensen:
No-Reference Speech Intelligibility Prediction Leveraging a Noisy-Speech ASR Pre-Trained Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OliveiraWRG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OliveiraWRG24
Danilo de Oliveira, Simon Welker, Julius Richter, Timo Gerkmann:
The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaDB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaDB24
Bao Thang Ta, Van Hai Do, Huynh Thi Thanh Binh:
Enhancing Non-Matching Reference Speech Quality Assessment through Dynamic Weight Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Chen00TAL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chen00TAL24
Hongyang Chen, Yuhong Yang, Zhongyuan Wang, Weiping Tu, Haojun Ai, Cedar Lin:
Exploring Sentence Type Effects on the Lombard Effect and Intelligibility Enhancement: A Comparative Study of Natural and Grid Sentences.

Speech Synthesis: Vocoders

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LvLYLXX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LvLYLXX24
Yuanjun Lv, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie:
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaudharyA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaudharyA24
Aryan Chaudhary, Vinayak Abrol:
QGAN: Low Footprint Quaternion Neural Vocoder for Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoLJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoLJ24
Hyunjae Cho, Junhyeok Lee, Wonbin Jung:
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShenRS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShenRS24
Rubing Shen, Yanzhen Ren, Zongkun Sun:
FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenT24
Shaowen Chen, Tomoki Toda:
QHM-GAN: Neural Vocoder based on Quasi-Harmonic Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuLAL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuLAL24
Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling:
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation.

ASR Model Training Methods

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RaissiLBSN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RaissiLBSN24
Tina Raissi, Christoph Lüscher, Simon Berger, Ralf Schlüter, Hermann Ney:
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaurAWHRR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaurAWHRR24
Neeraj Gaur, Rohan Agrawal, Gary Wang, Parisa Haghani, Andrew Rosenberg, Bhuvana Ramabhadran:
ASTRA: Aligning Speech and Text Representations for Asr without Sampling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001SP024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001SP024
Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChiuWH0W24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChiuWH0W24
Sheng-Chieh Chiu, Chia-Hua Wu, Jih-Kang Hsieh, Yu Tsao, Hsin-Min Wang:
Learnable Layer Selection and Model Fusion for Speech Self-Supervised Learning Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kulshreshtha0HD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kulshreshtha0HD24
Devang Kulshreshtha, Nikolaos Pappas, Brady Houston, Saket Dingliwal, Srikanth Ronanki:
Sequential Editing for Lifelong Training of Speech Recognition Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YehCHC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YehCHC24
Chia-Kai Yeh, Chih-Chun Chen, Ching-Hsien Hsu, Jen-Tzung Chien:
Cross-Modality Diffusion Modeling and Sampling for Speech Recognition.

Cross-Lingual and Multilingual Processing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0147HYC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0147HYC024
Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee:
A Parameter-efficient Language Extension Framework for Multilingual ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongZ0M0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongZ0M0024
Zheshu Song, Jianheng Zhuo, Yifan Yang, Ziyang Ma, Shixiong Zhang, Xie Chen:
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BoitoILBC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BoitoILBC24
Marcely Zanon Boito, Vivek Iyer, Nikolaos Lagos, Laurent Besacier, Ioan Calapodescu:
mHuBERT-147: A Compact Multilingual HuBERT Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LodagalaBDFU24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LodagalaBDFU24
Vasista Sai Lodagala, Abhishek Biswas, Shoutrik Das, Jordan Fernandes, Srinivasan Umesh:
All Ears: Building Self-Supervised Learning based ASR models for Indian Languages at scale.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JakharSB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JakharSB24
Nikhil Jakhar, Sudhanshu Srivastava, Arun Baby:
A Unified Approach to Multilingual Automatic Speech Recognition with Improved Language Identification for Indic Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Dong0WHGZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Dong0WHGZ24
Ling Dong, Zhengtao Yu, Wenjun Wang, Yuxin Huang, Shengxiang Gao, Guojiang Zhou:
Integrating Speech Self-Supervised Learning Models and Large Language Models for ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianPCCL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianPCCL024
Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe:
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PuvvadaZ0HKDMRC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PuvvadaZ0HKDMRC24
Krishna C. Puvvada, Piotr Zelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg:
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Paraskevopoulos24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Paraskevopoulos24
Georgios Paraskevopoulos, Chara Tsoukala, Athanasios Katsamanis, Vassilis Katsouros:
The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VakirtzianTBMSP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VakirtzianTBMSP24
Socrates Vakirtzian, Chara Tsoukala, Stavros Bompolas, Katerina Mouzou, Vivian Stamou, Georgios Paraskevopoulos, Antonios Dimakis, Stella Markantonatou, Angela Ralli, Antonios Anastasopoulos:
Speech Recognition for Greek Dialects: A Challenging Benchmark.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuHYC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuHYC024
Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee:
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kummervold0WBS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kummervold0WBS24
Per Egil Kummervold, Javier de la Rosa, Freddy Wetjen, Rolv-Arild Braaten, Per Erik Solberg:
Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SrivastavaSC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SrivastavaSC024
Tejes Srivastava, Jiatong Shi, William Chen, Shinji Watanabe:
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HusseinRWPGK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HusseinRWPGK24
Amir Hussein, Desh Raj, Matthew Wiesner, Daniel Povey, Paola García, Sanjeev Khudanpur:
Enhancing Neural Transducer for Multilingual ASR with Synchronized Language Diarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeCHX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeCHX24
Shuaishuai Ye, Shunfei Chen, Xinhui Hu, Xinkang Xu:
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR.

Speech Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuC24
Chung-Wen Wu, Berlin Chen:
Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Cheng24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Cheng24
Jian Cheng:
Context-Aware Speech Recognition Using Prompts for Language Learners.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GothiKPNR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GothiKPNR24
Raj Gothi, Rahul Kumar, Mildred Pereira, Nagesh Nayak, Preeti Rao:
A Dataset and Two-pass System for Reading Miscue Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LunVAGK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LunVAGK24
Tin Mei Lun, Ekaterina Voskoboinik, Ragheb Al-Ghezi, Tamás Grósz, Mikko Kurimo:
Oversampling, Augmentation and Curriculum Learning for Speaking Assessment with Limited Training Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TomitaGMNS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TomitaGMNS24
Yu Tomita, Yingxiang Gao, Nobuaki Minematsu, Noriko Nakanishi, Daisuke Saito:
Analysis and Visualization of Directional Diversity in Listening Fluency of World Englishes Speakers in the Framework of Mutual Shadowing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RobertsonPD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RobertsonPD24
Sean Robertson, Gerald Penn, Ewan Dunbar:
Quantifying the Role of Textual Predictability in Automatic Speech Recognition.

Question Answering from Speech and Spoken Dialogue Systems

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RajkhowaCNTP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RajkhowaCNTP24
Tonmoy Rajkhowa, Amartya Roy Chowdhury, Sankalp Nagaonkar, Achyut Mani Tripathi, S. R. Mahadeva Prasanna:
TM-PATHVQA: 90000+ Textless Multilingual Questions for Medical Visual Question Answering.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanMBNB024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanMBNB024
Orchid Chetia Phukan, Priyabrata Mallick, Swarup Ranjan Behera, Aalekhya Satya Narayani, Arun Balaji Buduru, Rajesh Sharma:
Towards Multilingual Audio-Visual Question Answering.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0007NK0V24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0007NK0V24
Minh Nguyen, Toàn Quoc Nguyên, Kishan KC, Zeyu Zhang, Thuy Vu:
Reinforcement Learning from Answer Reranking Feedback for Retrieval-Augmented Answer Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NorooziCMHBG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NorooziCMHBG24
Vahid Noroozi, Zhehuai Chen, Somshubra Majumdar, Steve Huang, Jagadeesh Balam, Boris Ginsburg:
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BrattoMO24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BrattoMO24
Martina Di Bratto, Maria Di Maro, Antonio Origlia:
On the Use of Plausible Arguments in Explainable Conversational AI.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaihaqiCKY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaihaqiCKY24
Muhammad Yeza Baihaqi, Angel F. Garcia Contreras, Seiya Kawano, Koichiro Yoshino:
Rapport-Driven Virtual Agent: Rapport Building Dialogue Strategy for Improving User Experience at First Meeting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouLDYZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouLDYZ24
Lifeng Zhou, Yuke Li, Rui Deng, Yuting Yang, Haoqi Zhu:
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval.

Spoken Dialogue Systems and Conversational Analysis 3

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangML24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangML24
Zilong Huang, Man-Wai Mak, Kong Aik Lee:
MM-NodeFormer: Node Transformer Multimodal Fusion for Emotion Recognition in Conversation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiL024
Haoxiang Shi, Ziqi Liang, Jun Yu:
Emotional Cues Extraction and Fusion for Multi-modal Emotion Prediction and Recognition in Conversation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuzukiHSMM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuzukiHSMM24
Keita Suzuki, Nobukatsu Hojo, Kazutoshi Shinoda, Saki Mizuno, Ryo Masumura:
Participant-Pair-Wise Bottleneck Transformer for Engagement Estimation from Video Conversation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OMahonyLS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OMahonyLS24
Johannah O'Mahony, Catherine Lai, Éva Székely:
Well, what can you do with messy data? Exploring the prosody and pragmatic function of the discourse marker "well" with found data and speech synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShinodaHMSKM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShinodaHMSKM24
Kazutoshi Shinoda, Nobukatsu Hojo, Saki Mizuno, Keita Suzuki, Satoshi Kobashikawa, Ryo Masumura:
Learning from Multiple Annotator Biased Labels in Multimodal Conversation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoscilowiczWCCM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoscilowiczWCCM24
Jakub Hoscilowicz, Adam Wiacek, Jan Chojnacki, Adam Cieslak, Leszek Michon, Artur Janicki:
Non-Linear Inference Time Intervention: Improving LLM Truthfulness.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0011KK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0011KK24
Zhe Liu, Suyoun Kim, Ozlem Kalinli:
Evaluating Speech Recognition Performance Towards Large Language Model Based Voice Assistants.

Dysarthric Speech Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZaheeraSPT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZaheeraSPT24
Fathima Zaheera, Supritha Shetty, Gayadhar Pradhan, Deepak K. T:
Automatic Assessment of Dysarthria using Speech and synthetically generated Electroglottograph signal.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WanSK0GGW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WanSK0GGW24
Yan Wan, Mengyi Sun, Xinchen Kang, Jingting Li, Pengfei Guo, Ming Gao, Su-Jing Wang:
CDSD: Chinese Dysarthria Speech Database.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SampturBKVBNG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SampturBKVBNG24
Neelesh Samptur, Tanuka Bhattacharjee, Anirudh Chakravarty K, Seena Vengalil, Yamini Belur, Atchayaram Nalini, Prasanta Kumar Ghosh:
Exploring Syllable Discriminability during Diadochokinetic Task with Increasing Dysarthria Severity for Patients with Amyotrophic Lateral Sclerosis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PerezSNP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PerezSNP24
Matthew Perez, Aneesha Sampath, Minxue Niu, Emily Mower Provost:
Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DaoudiVFFTRWM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DaoudiVFFTRWM24
Khalid Daoudi, Solange Milhé de Saint Victor, Alexandra Foubert-Samier, Margherita Fabbri, Anne Pavy-Le Traon, Olivier Rascol, Virginie Woisard, Wassilios G. Meissner:
Electroglottography for the assessment of dysphonia in Parkinson's disease and multiple system atrophy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenYWWWM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenYWWWM24
Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng:
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction.

Spoken Language Models for Universal Speech Processing (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiSGW0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiSGW0024
Bohan Li, Feiyu Shen, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu:
On the Effectiveness of Acoustic BPE in Decoder-Only TTS.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangH0L24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangH0L24
Kai-Wei Chang, Ming-Hao Hsu, Shang-Wen Li, Hung-yi Lee:
Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuanHL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuanHL24
Chun-Yi Kuan, Wei-Ping Huang, Hung-yi Lee:
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TangYSC0LZLMWZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TangYSC0LZLMWZ24
Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
Can Large Language Models Understand Spatial Audio?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShonKHS0L24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShonKHS0L24
Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu:
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuCF0GWL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuCF0GWL24
Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee:
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanWGSC0L24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanWGSC0L24
Jing Pan, Jian Wu, Yashesh Gaur, Sunit Sivasankaran, Zhuo Chen, Shujie Liu, Jinyu Li:
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MessicaA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MessicaA24
Shoval Messica, Yossi Adi:
NAST: Noise Aware Speech Tokenization for Speech Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShechtmanD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShechtmanD24
Slava Shechtman, Avihu Dekel:
Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer.

Keynote 4

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Tillmann24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Tillmann24
Barbara Tillmann:
Perception of music and speech: Focus on rhythm processing.

L1/L2 Acquisition and Cross-Linguistic Factors

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HwangH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HwangH24
Hyun Kyung Hwang, Manami Hirayama:
Acquisition of high vowel devoicing in Japanese: A production experiment with three and four year olds.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiMSYT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiMSYT24
Zimeng Li, Zhongxuan Mao, Shengting Shen, Ivan Yuen, Ping Tang:
The Production of Contrastive Focus by 7 to 13-year-olds Learning Mandarin Chinese.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZaitovaSXAMK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZaitovaSXAMK24
Iuliia Zaitova, Irina Stenger, Wei Xue, Tania Avgustinova, Bernd Möbius, Dietrich Klakow:
Cross-Linguistic Intelligibility of Non-Compositional Expressions in Spoken Context.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DeMaereRLGG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DeMaereRLGG24
Alexis DeMaere, Nicole van Rootselaar, Fangfang Li, Robbin Gibb, Claudia L. R. Gonzalez:
On the relationship between speech production and vocabulary size in 3-5 year olds.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PolzehlHWWKD024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PolzehlHWWKD024
Tim Polzehl, Tim Herzig, Friedrich Wicke, Kathleen Wermke, Razieh Khamsehashari, Michiko Dahlem, Sebastian Möller:
Towards Classifying Mother Tongue from Infant Cries - Findings Substantiating Prenatal Learning Theory.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGW24
Aijun Li, Jun Gao, Zhiwei Wang:
Effect of Complex Boundary Tones on Tone Identification: An Experimental Study with Mandarin-speaking Preschool Children.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TruongW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TruongW24
Thanh Lan Truong, Andrea Weber:
Ethnolinguistic Identification of Vietnamese-German Heritage Speech.

Speaker Stance, Emotion and Language-External Factors

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoffnerRM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoffnerRM24
Dirk Eike Hoffner, Jana Roßbach, Bernd T. Meyer:
Joint prediction of subjective listening effort and speech intelligibility based on end-to-end learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuXLSWY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuXLSWY24
Xinyi Wu, Changqing Xu, Nan Li, Rongfeng Su, Lan Wang, Nan Yan:
Depression Enhances Internal Inconsistency between Spoken and Semantic Emotion: Evidence from the Analysis of Emotion Expression in Conversation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SimantirakiC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SimantirakiC24
Olympia Simantiraki, Martin Cooke:
Listeners' F0 preferences in quiet and stationary noise.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Hodoshima24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Hodoshima24
Nao Hodoshima:
Effects of talker and playback rate of reverberation-induced speech on speech intelligibility of older adults.

Experimental Phonetics and Laboratory Phonology

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoKW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoKW24
Bingliang Zhao, Jiangping Kong, Xiyu Wu:
Age-related Differences in Acoustic Cues for the Perception of Checked Syllables in Shengzhou Wu.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KalandL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KalandL24
Constantijn Kaland, Maria Lialiou:
Quantity-sensitivity affects recall performance of word stress.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Tokac024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Tokac024
Zuheyra Tokac, Jennifer Cole:
Phonological Symmetry Does Not Predict Generalization of Perceptual Adaptation to Vowels.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ReitsemaLLPJYWC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ReitsemaLLPJYWC24
Ariëlle Reitsema, Chenxin Li, Leanne van Lambalgen, Laura Preining, Saskia Galindo Jong, Qing Yang, Xinyi Wen, Yiya Chen:
Perceptual Learning in Lexical Tone: Phonetic Similarity vs. Phonological Categories.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SteinT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SteinT24
Anna Stein, Kevin Tang:
Modeling probabilistic reduction across domains with Naive Discriminative Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeLS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeLS24
Jonathan Him Nok Lee, Mark Liberman, Martin Salzmann:
Do we EXPECT TO find phonetic traces for syntactic traces?

Speaker recognition evaluation and resources

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lin0ZGZL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lin0ZGZL24
Yuke Lin, Ming Cheng, Fulin Zhang, Yingying Gao, Shilei Zhang, Ming Li:
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HutiriPDS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HutiriPDS24
Wiebke Hutiri, Tanvina Patel, Aaron Yi Ding, Odette Scharenborg:
As Biased as You Measure: Methodological Pitfalls of Bias Evaluations in Speaker Verification Research.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZLLWG0Q024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZLLWG0Q024
Shuai Wang, Ke Zhang, Shaoxiong Lin, Junjie Li, Xuefei Wang, Meng Ge, Jianwei Yu, Yanmin Qian, Haizhou Li:
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungZSAHGTA024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungZSAHGTA024
Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Alex Gichamba, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe:
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangJRXC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangJRXC24
Xinghao Huang, Weiwei Jiang, Long Rao, Wei Xu, Wenqing Cheng:
Active Speaker Detection in Fisheye Meeting Scenes with Scene Spatial Spectrums.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoangPXNDN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoangPXNDN24
Vu Hoang, Viet-Thanh Pham, Hoa Nguyen Xuan, Pham Nhi, Phuong Dat, Thi Thu Trang Nguyen:
VSASV: a Vietnamese Dataset for Spoofing-Aware Speaker Verification.

Speech Type Classification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaSLZQHSY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaSLZQHSY24
Liuxian Ma, Lin Shen, Ruobing Li, Haojie Zhang, Kun Qian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto:
E-ODN: An Emotion Open Deep Network for Generalised and Adaptive Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001NPHM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001NPHM24
Joseph Liu, Mahesh Kumar Nandwana, Janne Pylkkönen, Hannes Heikinheimo, Morgan McGuire:
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NafeaSTASN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NafeaSTASN24
Youssef Nafea, Shady Shehata, Zeerak Talat, Ahmed Aboeitta, Ahmed Sharshar, Preslav Nakov:
AraOffence: Detecting Offensive Speech Across Dialects in Arabic Media.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengEVA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengEVA24
Jiali Cheng, Mohamed Elgaar, Nidhi Vakil, Hadi Amiri:
CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiuQC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiuQC024
Fangjing Niu, Xiaozhe Qi, Xinya Chen, Liang He:
Speech Topic Classification Based on Multi-Scale and Graph Attention Networks.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenZTC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenZTC24
Liangwei Chen, Xiren Zhou, Qiang Tu, Huanhuan Chen:
Enhancing Speech and Music Discrimination Through the Integration of Static and Dynamic Features.

Target Speaker Extraction

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengZZSA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengZZSA24
Hanyu Meng, Qiquan Zhang, Xiangyu Zhang, Vidhyasaharan Sethu, Eliathamby Ambikairajah:
Binaural Selective Attention Model for Target Speaker Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PandeyLAWX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PandeyLAWX24
Ashutosh Pandey, Sanha Lee, Juan Azcarreta, Daniel Wong, Buye Xu:
All Neural Low-latency Directional Speech Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeoMKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeoMKC24
Woon-Haeng Heo, Joongyu Maeng, Yoseb Kang, Namhyun Cho:
Centroid Estimation with Transformer-Based Speaker Embedder for Robust Target Speaker Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SrinivasICEYG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SrinivasICEYG24
Vidya Srinivas, Malek Itani, Tuochao Chen, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota:
Knowledge boosting during low-latency inference.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuHPHMZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuHPHMZ24
Tianci Wu, Shulin He, Jiahui Pan, Haifeng Huang, Zhijian Mo, Xueliang Zhang:
Unified Audio Visual Cues for Target Speaker Extraction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLMY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLMY24
Yun Liu, Xuechen Liu, Xiaoxiao Miao, Junichi Yamagishi:
Target Speaker Extraction with Curriculum Learning.

Speech Synthesis: Voice Conversion 3

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaiWGL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaiWGL24
Bingsong Bai, Fengping Wang, Yingming Gao, Ya Li:
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SalmanDCUBS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SalmanDCUBS24
Ali N. Salman, Zongyang Du, Shreeram Suresh Chandra, Ismail Rasim Ülgen, Carlos Busso, Berrak Sisman:
Towards Naturalistic Voice Conversion: NaturalVoices Dataset with an Automatic Processing Pipeline.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanakaKKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanakaKKK24
Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Yuto Kondo:
PRVAE-VC2: Non-Parallel Voice Conversion by Distillation of Speech Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Niu0M24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Niu0M24
Xinlei Niu, Jing Zhang, Charles Patrick Martin:
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaiTWQE24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaiTWQE24
Jiarui Hai, Karan Thakkar, Helin Wang, Zengyi Qin, Mounya Elhilali:
DreamVoice: Text-Guided Voice Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeOHL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeOHL24
Jaejun Lee, Yoori Oh, Injune Hwang, Kyogu Lee:
Hear Your Face: Face-based voice conversion with F0 estimation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiriwardenaSHGD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiriwardenaSHGD24
Yashish M. Siriwardena, Nathan Swedlow, Audrey Howard, Evan Gitterman, Dan Darcy, Carol Y. Espy-Wilson, Andrea Fanelli:
Accent Conversion with Articulatory Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangL024
Jen-Hung Huang, Wei-Tsung Lee, Chung-Hsien Wu:
USD-AC: Unsupervised Speech Disentanglement for Accent Conversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanagawaI24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanagawaI24
Hiroki Kanagawa, Yusuke Ijima:
Knowledge Distillation from Self-Supervised Representation Learning Model with Discrete Speech Units for Any-to-Any Streaming Voice Conversion.

Speech Synthesis: Paradigms and Methods 3

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangWGCWM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangWGCWM24
Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng:
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LovelaceRKWW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LovelaceRKWW24
Justin Lovelace, Soham Ray, Kwangyoun Kim, Kilian Q. Weinberger, Felix Wu:
Sample-Efficient Diffusion for Text-To-Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengYT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengYT24
Jingyi Feng, Yusuke Yasuda, Tomoki Toda:
Exploring the Robustness of Text-to-Speech Synthesis Based on Diffusion Probabilistic Models to Heavily Noisy Transcriptions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimLYL0Y24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimLYL0Y24
Heeseung Kim, Sang-gil Lee, Jiheum Yeom, Che Hyun Lee, Sungwon Kim, Sungroh Yoon:
VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SadekovaKPYK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SadekovaKPYK24
Tasnima Sadekova, Mikhail A. Kudinov, Vadim Popov, Assel Yermekova, Artem Khrapov:
PitchFlow: adding pitch control to a Flow-matching based TTS model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangLCJKL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangLCJKL24
Jinhyeok Yang, Junhyeok Lee, Hyeong-Seok Choi, Seunghoon Ji, Hyeongju Kim, Juheon Lee:
DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenLCYQ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenLCYQ24
Zhengyang Chen, Xuechen Liu, Erica Cooper, Junichi Yamagishi, Yanmin Qian:
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Song0WMY024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Song0WMY024
Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen:
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers.

Privacy and Security in Speech Communication 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhoshJDSPSS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhoshJDSPSS24
Suhita Ghosh, Mélanie Jouaiti, Arnab Das, Yamini Sinha, Tim Polzehl, Ingo Siegert, Sebastian Stober:
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangCLL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangCLL24
Rui Wang, Liping Chen, Kong Aik Lee, Zhen-Hua Ling:
Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeyerLV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeyerLV24
Sarina Meyer, Florian Lux, Ngoc Thang Vu:
Probing the Feasibility of Multilingual Speaker Anonymization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangZZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangZZ24
Fan Huang, Kun Zeng, Wei Zhu:
DiffVC+: Improving Diffusion-based Voice Conversion for Speaker Anonymization.

Streaming ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangMLDZR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangMLDZR24
Yuting Yang, Guodong Ma, Yuke Li, Binbin Du, Haoqi Zhu, Liang Ruan:
Learning from Back Chunks: Acquiring More Future Knowledge for Streaming ASR Models via Self Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsunooFKA024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsunooFKA024
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Decoder-only Architecture for Streaming End-to-end Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenSSYX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenSSYX24
Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, Lei Xie:
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeitkaemperCNH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeitkaemperCNH24
Jens Heitkaemper, Joe Caroselli, Arun Narayanan, Nathan Howard:
TfCleanformer: A streaming, array-agnostic, full- and sub-band modeling front-end for robust ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeC24
Khanh Le, Duc Chau:
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHLZL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHLZL24
Haoyu Wang, Guoqiang Hu, Guodong Lin, Wei-Qiang Zhang, Jian Li:
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection.

Computational Resource Constrained ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Xiao0FW0PPPM024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Xiao0FW0PPPM024
Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu:
Dynamic Data Pruning for Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimC24
Dong-Hyun Kim, Joon-Hyuk Chang:
Mitigating Overfitting in Structured Pruning of ASR Models with Gradient-Guided Parameter Regularization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuL0Q24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuL0Q24
Tianteng Gu, Bei Liu, Hang Shao, Yanmin Qian:
SparseWAV: Fast and Accurate One-Shot Unstructured Pruning for Large Speech Foundation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXWHJHDCGL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXWHJHDCGL24
Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui Jin, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu:
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RybakovSZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RybakovSZ24
Oleg Rybakov, Dmitriy Serdyuk, Chengjian Zheng:
USM RNN-T model weights binarization.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinL024
Tzu-Quan Lin, Hung-yi Lee, Hao Tang:
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkAK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkAK24
Eunik Park, Daehyun Ahn, Hyungjun Kim:
RepTor: Re-parameterizable Temporal Convolution for Keyword Spotting via Differentiable Kernel Search.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZSWWWZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZSWWWZ24
Shuai Wang, Dehao Zhang, Kexin Shi, Yuchen Wang, Wenjie Wei, Jibin Wu, Malu Zhang:
Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongLYP024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongLYP024
Zeyang Song, Qianhui Liu, Qu Yang, Yizhou Peng, Haizhou Li:
ED-sKWS: Early-Decision Spiking Neural Networks for Rapid, and Energy-Efficient Keyword Spotting.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LingLCHL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LingLCHL24
Tongtao Ling, Yutao Lai, Lei Chen, Shilei Huang, Yi Liu:
A Small and Fast BERT for Chinese Medical Punctuation Restoration.

Evaluation of Speech Technology Systems

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeuserK0MBMJ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeuserK0MBMJ24
Annika Heuser, Tyler Kendall, Miguel Del Rio, Quinn McNamara, Nishchal Bhandari, Corey Miller, Migüel Jetté:
Quantification of stylistic differences in human- and ASR-produced transcripts of African American English.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuhnKZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuhnKZ24
Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann:
Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TelekiDKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TelekiDKC24
Maria Teleki, Xiangjue Dong, Soohwan Kim, James Caverlee:
Comparing ASR Systems in the Context of Speech Disfluencies.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuL24
Xiang-Li Lu, Yi-Fen Liu:
Deep Prosodic Features in Tandem with Perceptual Judgments of Word Reduction for Tone Recognition in Conversed Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SasindranYP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SasindranYP24
Zitha Sasindran, Harsha Yelchuri, T. Venkata Prabhakar:
SeMaScore: A new evaluation metric for automatic speech recognition tasks.

Neural Network Training for Speech Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/00020YBS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/00020YBS24
Jingjing Xu, Wei Zhou, Zijian Yang, Eugen Beck, Ralf Schlüter:
Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HouHBGHK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HouHBGHK24
Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff:
Revisiting Convolution-free Transformer for Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangCJLRWZZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangCJLRWZZ24
Zhiqi Huang, Diamantino Caseiro, Kandarp Joshi, Christopher Li, Pat Rondon, Zelin Wu, Petr Zadrazil, Lillian Zhou:
Optimizing Large-Scale Context Retrieval for End-to-End ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiPNFL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiPNFL024
Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe:
Self-Supervised Speech Representations are More Phonetic than Semantic.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanXL0N24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanXL0N24
Shiyi Han, Mingbin Xu, Zhihong Lei, Zhen Huang, Xingyu Na:
Enhancing CTC-based speech recognition with diverse modeling units.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimKL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimKL24
Eungbeom Kim, Hantae Kim, Kyogu Lee:
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation.

Leveraging Large Language Models and Contextual Features for Phonetic Analysis (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KlootsZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KlootsZ24
Marianne de Heer Kloots, Willem H. Zuidema:
Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinW0M24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinW0M24
Yuqin Lin, Longbiao Wang, Jianwu Dang, Nobuaki Minematsu:
Exploring Pre-trained Speech Model for Articulatory Feature Extraction in Dysarthric Speech Using ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaoAVTN024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaoAVTN024
Yun Hao, Reihaneh Amooie, Wietse de Vries, Thomas Tienkamp, Rik van Noord, Martijn Wieling:
Exploring Self-Supervised Speech Representations for Cross-lingual Acoustic-to-Articulatory Inversion.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShamsGEC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShamsGEC24
Erfan A. Shams, Iona Gessinger, Patrick Cormac English, Julie Carson-Berndsen:
Are Articulatory Feature Overlaps Shrouded in Speech Embeddings?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EnglishKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EnglishKC24
Patrick Cormac English, John D. Kelleher, Julie Carson-Berndsen:
Searching for Structure: Appraising the Organisation of Speech Features in wav2vec 2.0 Embeddings.

Responsible Speech Foundation Models (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WiepertUDSBJB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WiepertUDSBJB24
Daniela A. Wiepert, Rene L. Utianski, Joseph R. Duffy, John L. Stricker, Leland R. Barnard, David T. Jones, Hugo Botha:
Speech foundation models in healthcare: Effect of layer selection on pathological speech feature prediction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002BRB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002BRB24
Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet:
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KulkarniKCT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KulkarniKCT24
Ajinkya Kulkarni, Atharva Kulkarni, Miguel Couceiro, Isabel Trancoso:
Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinWCLL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinWCLL24
Yi-Cheng Lin, Haibin Wu, Huang-Cheng Chou, Chi-Chun Lee, Hung-yi Lee:
Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinLLLL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinLLLL24
Yi-Cheng Lin, Tzu-Quan Lin, Hsi-Che Lin, Andy T. Liu, Hung-yi Lee:
On the social bias of speech self-supervised models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangCSCHSM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangCSCHSM24
Kalvin Chang, Yi-Hui Chou, Jiatong Shi, Hsuan-Ming Chen, Nicole Holliday, Odette Scharenborg, David R. Mortensen:
Self-supervised Speech Representations Still Struggle with African American Vernacular English.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AldenehHJSLSA0T24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AldenehHJSLSA0T24
Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Barry-John Theobald:
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Meng0WJWLM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Meng0WJWLM24
Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng:
Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System.

Multimodal Paralinguistics

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Cai00M24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Cai00M24
Yunrui Cai, Zhiyong Wu, Jia Jia, Helen Meng:
LoRA-MER: Low-Rank Adaptation of Pre-Trained Speech Models for Multimodal Emotion Recognition Using Mutual Information.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGWWL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGWWL24
Qifei Li, Yingming Gao, Yuhua Wen, Cong Wang, Ya Li:
Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuF0L24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuF0L24
Kang Zhu, Cunhang Fan, Jianhua Tao, Zhao Lv:
Prompt Link Multimodal Fusion in Multimodal Sentiment Analysis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangIH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangIH24
Kexin Wang, Carlos Ishi, Ryoko Hayashi:
A multimodal analysis of different types of laughter expression in conversational dialogues.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChochlakisLMH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChochlakisLMH24
Georgios Chochlakis, Chandrashekhar Lavania, Prashant Mathur, Kyu J. Han:
Tackling Missing Modalities in Audio-Visual Representation Learning Using Masked Autoencoders.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KyungHC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KyungHC24
Jehyun Kyung, Serin Heo, Joon-Hyuk Chang:
Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoncalvesRRB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoncalvesRRB24
Lucas Goncalves, Donita Robinson, Elizabeth Richerson, Carlos Busso:
Bridging Emotions Across Languages: Low Rank Adaptation for Multilingual Speech Emotion Recognition.

Automatic Emotion Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UpadhyayBL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UpadhyayBL24
Shreya G. Upadhyay, Carlos Busso, Chi-Chun Lee:
A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanKB024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanKB024
Orchid Chetia Phukan, Gautam Siddharth Kashyap, Arun Balaji Buduru, Rajesh Sharma:
Are Paralinguistic Representations all that is needed for Speech Emotion Recognition?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004ZGZLF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004ZGZLF24
Haiyang Sun, Fulin Zhang, Yingying Gao, Shilei Zhang, Zheng Lian, Junlan Feng:
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KhaertdinovJSH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KhaertdinovJSH24
Bulat Khaertdinov, Pedro Jeuris, Annanda Sousa, Enrique Hortal:
Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations.

Self and Weakly-Labelled Speaker Verification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangFH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangFH24
Junxu Wang, Zhihua Fang, Liang He:
Self-Supervised Speaker Verification with Mini-Batch Prediction Correction.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWZX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWZX24
Yue Li, Xinsheng Wang, Li Zhang, Lei Xie:
SCDNet: Self-supervised Learning Feature based Speaker Change Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JinTM24a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinTM24a
Zezhong Jin, Youzhi Tu, Man-Wai Mak:
Self-Supervised Learning with Multi-Head Multi-Mode Knowledge Distillation for Speaker Verification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SelvakumarF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SelvakumarF24
Anith Selvakumar, Homa Fashandi:
Getting More for Less: Using Weak Labels and AV-Mixup for Robust Audio-Visual Speaker Verification.

Acoustic Event Detection, Segmentation and Classification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BeheraDGN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BeheraDGN24
Swarup Ranjan Behera, Abhishek Dhiman, Karthik Gowda, Aalekhya Satya Narayani:
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0007F0T24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0007F0T24
Li Xiao, Lucheng Fang, Yuhong Yang, Weiping Tu:
LungAdapter: Efficient Adapting Audio Spectrogram Transformer for Lung Sound Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengECS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengECS24
Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak:
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OmineAT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OmineAT24
Taisei Omine, Kenta Akita, Reiji Tsuruno:
Robust Laughter Segmentation with Automatic Diverse Data Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LebourdaisMATG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LebourdaisMATG24
Martin Lebourdais, Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega Giménez:
Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElbannaMM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElbannaMM24
Gasser Elbanna, Zohreh Mostaani, Mathew Magimai-Doss:
Predicting Heart Activity from Speech using Data-driven and Knowledge-based features.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MorozovaYSB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MorozovaYSB24
Natalia Morozova, Guanghao You, Sabine Stoll, Adrian Bangerter:
Measuring acoustic dissimilarity of hierarchical markers in task-oriented dialogue with MFCC-based dynamic time warping.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Buddi0SGRRAA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Buddi0SGRRAA24
Sai Srujana Buddi, Satyam Kumar, Utkarsh Oggy Sarawgi, Vineet Garg, Shivesh Ranjan, Ognjen Rudovic, Ahmed Hussen Abdelaziz, Saurabh Adya:
Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangFWXLWLL0QLS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangFWXLWLL0QLS24
Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Xin Qi, Yi Lu, Shuchen Shi:
Generalized Fake Audio Detection via Deep Stable Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PalaskarRDPKSBA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PalaskarRDPKSBA24
Shruti Palaskar, Ognjen Rudovic, Sameer Dharur, Florian Pesce, Gautam Krishna, Aswin Sivaraman, Jack Berkowitz, Ahmed Hussen Abdelaziz, Saurabh Adya, Ahmed H. Tewfik:
Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZangS0YHTXZGTD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZangS0YHTXZGTD24
Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, Zhiyao Duan:
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiLLTH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiLLTH24
Yongjie Si, Yanxiong Li, Jialong Li, Jiaxin Tan, Qianhua He:
Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KareemR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KareemR24
Noumida Abdul Kareem, Rajeev Rajan:
Multi-label Bird Species Classification from Field Recordings using Mel_Graph-GCN Framework.

Speech and Audio Modelling

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXXG000W24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXXG000W24
Baihan Li, Zeyu Xie, Xuenan Xu, Yiwei Guo, Ming Yan, Ji Zhang, Kai Yu, Mengyue Wu:
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangMGW0W24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangMGW0W24
Hualei Wang, Jianguo Mao, Zhifang Guo, Jiarui Wan, Hong Liu, Xiangdong Wang:
Leveraging Language Model Capabilities for Sound Event Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuZ0ZW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuZ0ZW24
Xuenan Xu, Pingyue Zhang, Ming Yan, Ji Zhang, Mengyue Wu:
Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuanWZWDWLHQ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuanWZWDWLHQ24
Wenhao Guan, Kaidi Wang, Wangjin Zhou, Yang Wang, Feng Deng, Hui Wang, Lin Li, Qingyang Hong, Yong Qin:
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CumlinLURSC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CumlinLURSC24
Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Schüldt, Saikat Chatterjee:
DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BoukunDL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BoukunDL24
Veranika Boukun, Jakob Drefs, Jörg Lücke:
Blind Zero-Shot Audio Restoration: A Variational Autoencoder Approach for Denoising and Inpainting.

Fake Audio Detection

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PascuSOOC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PascuSOOC24
Octavian Pascu, Adriana Stan, Dan Oneata, Elisabeta Oneata, Horia Cucu:
Towards generalisable and calibrated audio deepfake detection with self-supervised representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieFWWWCY024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieFWWWCY024
Yuankun Xie, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Xiaopeng Wang, Haonan Cheng, Long Ye, Jianhua Tao:
Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhongLY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhongLY24
Jiafeng Zhong, Bin Li, Jiangyan Yi:
Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWJL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWJL24
Xuanjun Chen, Haibin Wu, Roger Jang, Hung-yi Lee:
Singing Voice Graph Modeling for SingFake Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangFWWXL0LLQLS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangFWWXL0LLQLS24
Xiaopeng Wang, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Yuankun Xie, Yukun Liu, Jianhua Tao, Xuefei Liu, Yongwei Li, Xin Qi, Yi Lu, Shuchen Shi:
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimJK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimJK24
Hyun Myung Kim, Kangwook Jang, Hoirin Kim:
One-class learning with adaptive centroid shift for audio deepfake detection.

Deep Learning-Based Speech Enhancement: Approaches, Scalability, and Evaluation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaoWGLW0J24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaoWGLW0J24
Rui Cao, Tianrui Wang, Meng Ge, Andong Li, Longbiao Wang, Jianwu Dang, Yungang Jia:
VoiCor: A Residual Iterative Voice Correction Framework for Monaural Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParnamaaS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParnamaaS24
Tanel Pärnamaa, Ando Saabas:
Personalized Speech Enhancement Without a Separate Speaker Embedding Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangSSCLNPS0FQ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangSSCLNPS0FQ24
Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian:
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RichterWKWL0RG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RichterWKWL0RG24
Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann:
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation.

Speech Synthesis: Other Topics 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanMFW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanMFW24
Zhenxiong Tan, Xinyin Ma, Gongfan Fang, Xinchao Wang:
LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimCJKC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimCJKC24
Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi:
Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiMS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiMS24
Shuhua Li, Qirong Mao, Jiatong Shi:
PL-TTS: A Generalizable Prompt-based Diffusion TTS Augmented by Large Language Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbelCO24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbelCO24
Louis Abel, Vincent Colotte, Slim Ouni:
Towards realtime co-speech gestures synthesis using STARGATE.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiFW0WQLQLLLWW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiFW0WQLQLLLWW24
Shuchen Shi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Yi Lu, Xin Qi, Xuefei Liu, Yukun Liu, Yongwei Li, Zhiyong Wang, Xiaopeng Wang:
PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeJC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeJC24
Jaeuk Lee, Sohee Jang, Joon-Hyuk Chang:
Neural ATSM: Fully Neural Network-based Adaptive Time-Scale Modification Using Sentence-Specific Dynamic Control.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoLDZW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoLDZW24
Yinlin Guo, Yening Lv, Jinqiao Dou, Yan Zhang, Yuehai Wang:
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KunesovaLMMS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KunesovaLMMS24
Marie Kunesová, Jan Lehecka, Josef Michálek, Jindrich Matousek, Jan Svec:
Zero-shot Out-of-domain is No Joke: Lessons Learned in the VoiceMOS 2023 MOS Prediction Challenge.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WardSCM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WardSCM24
Nigel G. Ward, Andres Segura, Alejandro Ceballos, Divette Marco:
Towards a General-Purpose Model of Perceived Pragmatic Similarity.

Speech Synthesis: Other Topics 2

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RatsepLF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RatsepLF24
Liisa Rätsep, Rasmus Lellep, Mark Fishel:
Enabling Conversational Speech Synthesis using Noisy Spontaneous Data.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangKS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangKS24
Dong Yang, Tomoki Koriyama, Yuki Saito:
Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SeongC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SeongC24
Donghyun Seong, Joon-Hyuk Chang:
H4C-TTS: Leveraging Multi-Modal Historical Context for Conversational Text-to-Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangCHL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangCHL24
Huai-Zhe Yang, Chia-Ping Chen, Shan-Yun He, Cheng-Ruei Li:
Bilingual and Code-switching TTS Enhanced with Denoising Diffusion Model and GAN.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaekiMT0S24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaekiMT0S24
Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari:
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonKUHHKK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonKUHHKK24
Juhwan Yoon, WooSeok Ko, Seyun Um, Sungwoong Hwang, Soojoong Hwang, Changhwan Kim, Hong-Goo Kang:
UNIQUE : Unsupervised Network for Integrated Speech Quality Evaluation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0003PH24a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0003PH24a
Minyoung Lee, Eunil Park, Sungeun Hong:
FVTTS : Face Based Voice Synthesis for Text-to-Speech.

Speech synthesis: Cross-lingual and multilingual aspects

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuxMBZDCHV24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuxMBZDCHV24
Florian Lux, Sarina Meyer, Lyonel Behringer, Frank Zalkow, Phat Do, Matt Coler, Emanuël A. P. Habets, Ngoc Thang Vu:
Meta Learning Text-to-Speech Synthesis in over 7000 Languages.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GongCWQGWWDTPRY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GongCWQGWWDTPRY24
Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi:
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuCCHWX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuCCHWX24
Jing Wu, Ting Chen, Minchuan Chen, Wei Hu, Shaojun Wang, Jing Xiao:
Improving Multilingual Text-to-Speech with Mixture-of-Language-Experts and Accent Disentanglement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuWWM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuWWM24
Jing Xu, Minglin Wu, Xixin Wu, Helen Meng:
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CasanovaDGGGHAM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CasanovaDGGGHAM24
Edresson Casanova, Kelly Davis, Eren Gölge, Görkem Göknar, Iulian Gulea, Logan Hart, Aya Aljafari, Joshua Meyer, Reuben Morais, Samuel Olayemi, Julian Weber:
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoLII24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoLII24
Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion.

Noise, Far-Field, Multi-Talker, Enhancement, Audio Classification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shao0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shao0024
Yiwen Shao, Shi-Xiong Zhang, Dong Yu:
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shao0XY0PK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shao0XY0PK24
Yiwen Shao, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Daniel Povey, Sanjeev Khudanpur:
Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RavenscroftCGHS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RavenscroftCGHS24
William Ravenscroft, George Close, Stefan Goetze, Thomas Hain, Mohammad Soleymanpour, Anurag Chowdhury, Mark C. Fuhs:
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VinnikovIHAKGPX24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VinnikovIHAKGPX24
Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Peer, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka:
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DissenYCK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DissenYCK24
Yehoshua Dissen, Shiry Yonash, Israel Cohen, Joseph Keshet:
Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaiderPLEB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaiderPLEB24
Daniel Haider, Felix Perfler, Vincent Lostanlen, Martin Ehler, Peter Balazs:
Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangMD0GH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangMD0GH24
Wenjun Wang, Shangbin Mo, Ling Dong, Zhengtao Yu, Junjun Guo, Yuxin Huang:
DGSRN: Noise-Robust Speech Recognition Method with Dual-Path Gated Spectral Refinement Network.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghNK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghNK24
Riyansha Singh, Parinita Nema, Vinod K. Kurmi:
Towards Robust Few-shot Class Incremental Learning in Audio Classification using Contrastive Representation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SheikhASHK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SheikhASHK24
Muhammad Umer Sheikh, Hassan Abid, Bhuiyan Sanjid Shafique, Asif Hanif, Muhammad Haris Khan:
Bird Whisperer: Leveraging Large Pre-trained Acoustic Model for Bird Call Classification.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SatoMMHOAASD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SatoMMHOAASD24
Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai, Takanori Ashihara, Atsushi Ando, Kentaro Shinayama, Marc Delcroix:
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BorsdorfP0S24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BorsdorfP0S24
Marvin Borsdorf, Zexu Pan, Haizhou Li, Tanja Schultz:
wTIMIT2mix: A Cocktail Party Mixtures Database to Study Target Speaker Extraction for Normal and Whispered Speech.

Self-Supervised Learning for ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GetmanGK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GetmanGK24
Yaroslav Getman, Tamás Grósz, Mikko Kurimo:
What happens in continued pre-training? Analysis of self-supervised speech models with continued pre-training for colloquial Finnish ASR.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KatoNCN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KatoNCN24
Akihiro Kato, Hiroyuki Nagano, Kohei Chike, Masaki Nose:
Self-Supervised Learning for ASR Pre-Training with Uniquely Determined Target Labels and Controlling Cepstrum Truncation for Speech Augmentation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YadavSS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YadavSS24
Hemant Yadav, Sunayana Sitaram, Rajiv Ratn Shah:
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeLKKC24a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeLKKC24a
Mun-Hak Lee, Jae-Hong Lee, Do-Hee Kim, Ye-Eun Ko, Joon-Hyuk Chang:
Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques.

Spoken Term Detection and Speech Retrieval

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuanSLWH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuanSLWH24
Junming Yuan, Ying Shi, Lantian Li, Dong Wang, Askar Hamdulla:
Few-Shot Keyword Spotting from Mixed Speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YusufCS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YusufCS24
Bolaji Yusuf, Jan Honza Cernocký, Murat Saraçlar:
Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeT24
Jiajun He, Tomoki Toda:
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004ZZLWZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004ZZLWZ24
Yuxin Xie, Zhihong Zhu, Xianwei Zhuang, Liming Liang, Zhichang Wang, Yuexian Zou:
GPA: Global and Prototype Alignment for Audio-Text Retrieval.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimSC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimSC24
Ilseok Kim, Ju-Seok Seong, Joon-Hyuk Chang:
Few-Shot Keyword-Incremental Learning with Total Calibration.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TapoFLHP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TapoFLHP24
Allahsera Tapo, Éric Le Ferrand, Zoey Liu, Christopher Homan, Emily Prud'hommeaux:
Leveraging Speech Data Diversity to Document Indigenous Heritage and Culture.

Speech Disorders 1

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MohapatraLBI024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MohapatraLBI024
Payal Mohapatra, Shamika Likhite, Subrata Biswas, Bashima Islam, Qi Zhu:
Missingness-resilient Video-enhanced Multimodal Disfluency Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GongXWXLXBWZQZD24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GongXWXLXBWZQZD24
Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li:
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZulfikarPCPWSNK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZulfikarPCPWSNK24
Wazeer Zulfikar, Nishat Protyasha, Camila Canales, Heli Patel, James Williamson, Laura Sarnie, Lisa Nowinski, Nataliya Kosmyna, Paige Townsend, Sophia Yuditskaya, Tanya Talkar, Utkarsh Oggy Sarawgi, Christopher J. McDougle, Thomas F. Quatieri, Pattie Maes, Maria Mody:
Analyzing Speech Motor Movement using Surface Electromyography in Minimally Verbal Adults with Autism Spectrum Disorder.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLDS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLDS24
Cong Zhang, Tong Li, Gayle DeDe, Christos Salis:
Prosody of speech production in latent post-stroke aphasia.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NieKA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NieKA24
Liangyu Nie, Sudarsana Reddy Kadiri, Ruchit Agrawal:
MMSD-Net: Towards Multi-modal Stuttering Detection.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002BBNRB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002BBNRB24
Dominik Wagner, Sebastian P. Bayerl, Ilja Baumann, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet:
Large Language Models for Dysfluency Detection in Stuttered Speech.

Connecting Speech-science and Speech-technology for Children's Speech (Special Session)

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DemopoulosLPKR24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DemopoulosLPKR24
Carly Demopoulos, Linnea Lampinen, Cristian Preciado, Hardik Kothare, Vikram Ramanarayanan:
Preliminary Investigation of Psychometric Properties of a Novel Multimodal Dialog Based Affect Production Task in Children and Adolescents with Autism.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CharuauBGB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CharuauBGB24
Delphine Charuau, Andrea Briglia, Erika Godde, Gérard Bailly:
Training speech-breathing coordination in computer-assisted reading.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KadambiMANLHB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KadambiMANLHB24
Prad Kadambi, Tristan J. Mahr, Lucas Annear, Henry Nomeland, Julie Liss, Katherine C. Hustad, Visar Berisha:
How Does Alignment Error Affect Automated Pronunciation Scoring in Children's Speech?
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BenwayPE24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BenwayPE24
Nina R. Benway, Jonathan L. Preston, Carol Y. Espy-Wilson:
Examining Vocal Tract Coordination in Childhood Apraxia of Speech with Acoustic-to-Articulatory Speech Inversion Feature Sets.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SukhadiaC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SukhadiaC24
Vrunda N. Sukhadia, Shammur Absar Chowdhury:
Children's Speech Recognition through Discrete Token Enhancement.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLG24
Yujia Wang, Hexin Liu, Leibny Paola García:
Bridging Child-Centered Speech Language Identification and Language Diarization via Phonetics.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoGSC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoGSC24
Lingyun Gao, Cristian Tejedor García, Helmer Strik, Catia Cucchiarini:
Reading Miscue Detection in Primary School through Automatic Speech Recognition.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaumannU0RB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaumannU0RB24
Ilja Baumann, Nicole Unger, Dominik Wagner, Korbinian Riedhammer, Tobias Bocklet:
Automatic Evaluation of a Sentence Memory Test for Preschool Children.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002HK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002HK24
Jialu Li, Mark Hasegawa-Johnson, Karrie Karahalios:
Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MedinPG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MedinPG24
Lucas Block Medin, Thomas Pellegrini, Lucile Gelin:
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanSA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanSA24
Ruchao Fan, Natarajan Balaji Shankar, Abeer Alwan:
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RollandA24a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RollandA24a
Thomas Rolland, Alberto Abad:
Introduction To Partial Fine-tuning: A Comprehensive Evaluation Of End-to-end Children's Automatic Speech Recognition Adaptation.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangYPS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangYPS24
Yuanyuan Zhang, Zhengjun Yue, Tanvina Patel, Odette Scharenborg:
Improving child speech recognition with augmented child-like speech.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GraaveLLF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GraaveLLF24
Thomas Graave, Zhengyang Li, Timo Lohrenz, Tim Fingscheidt:
Mixed Children/Adult/Childrenized Fine-Tuning for Children's ASR: How to Reduce Age Mismatch and Speaking Style Mismatch.
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuHFSTN24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuHFSTN24
Anfeng Xu, Kevin Huang, Tiantian Feng, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan:
Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions.

Show and Tell 4

- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/SirigirajuRMY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SirigirajuRMY24
Meenakshi Sirigiraju, Arjun Rajasekar, Abhishikth Meejuri, Chiranjeevi Yarra:
IIITH Ucchar e-Sudharak: an automatic English pronunciation corrector for school-going children with a teacher in the loop.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/YapTLT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YapTLT24
Boon Peng Yap, Kok Liang Tan, Zhenghao Li, Rong Tong:
Speech enabled visual acuity test.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/AibaSM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AibaSM24
Mayuko Aiba, Daisuke Saito, Nobuaki Minematsu:
A ChatGPT-based oral Q&A practice system for first-time student participants in international conferences.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/SridaranPVARC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SridaranPVARC24
Karthik Venkat Sridaran, Raja Praveen, Reuben T. Varghese, Ajish K. Abraham, Shankar R, Winnie Rachel Cherian:
Visual scene display application for augmentative and alternative communication.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Masuda-KatsuseS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Masuda-KatsuseS24
Ikuyo Masuda-Katsuse, Ayako Shirose:
CALL system using pitch-accent feature representations reflecting listeners' subjective adequacy.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/PrestonBPP24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrestonBPP24
Jonathan L. Preston, Nina R. Benway, Nathan R. Prestopnik, Nathan Preston:
The speech motor chaining web app for speech motor learning.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/YoderKHA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoderKHA24
Charlotte Yoder, Karrie Karahalios, Mark Hasegawa-Johnson, Shreyansh Agrawal:
Visualization for improving foreign language pronunciation.
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/PhanZKGK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhanZKGK24
Nhan Phan, Anna von Zansen, Maria Kautonen, Tamás Grósz, Mikko Kurimo:
CaptainA self-study mobile app for practising speaking: task completion assessment and feedback with generative AI.

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.