


default search action
ASRU 2023: Taipei, Taiwan
- IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, December 16-20, 2023. IEEE 2023, ISBN 979-8-3503-0689-7

- Shilong Wu, Jun Du, Mao-Kui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee:

Semi-Supervised Multi-Channel Speaker Diarization With Cross-Channel Attention. 1-8 - Da-Hee Yang

, Joon-Hyuk Chang:
Towards Robust Packet Loss Concealment System With ASR-Guided Representations. 1-8 - Feng-Ting Liao, Yung-Chieh Chan, Yi-Chang Chen, Chan-Jan Hsu, Da-Shan Shiu:

Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning. 1-8 - Zexu Pan, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux:

Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction. 1-8 - Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass:

Audio-Visual Neural Syntax Acquisition. 1-8 - Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan Honza Silovsky:

Importance of Smoothness Induced by Optimizers in Fl4Asr: Towards Understanding Federated Learning for End-To-End ASR. 1-8 - Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:

QUICKVC: A Lightweight VITS-Based Any-to-Many Voice Conversion Model using ISTFT for Faster Conversion. 1-7 - Ivan Fung, Lahiru Samarakoon, Samuel J. Broughton:

Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning. 1-7 - Bahman Mirheidari, Ronan O'Malley, Daniel Blackburn

, Heidi Christensen
:
Identifying People with Mild Cognitive Impairment at Risk of Developing Dementia using Speech Analysis. 1-6 - Sathvik Udupa, Jesuraja Bandekar, Deekshitha G, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan K. M., Raoul Nanavati:

Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages. 1-8 - Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu:

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR. 1-8 - Mark Lindsey

, Nathaniel R. Robinson, Francis Kubala, Richard M. Stern:
Reducing the Cost of Spoof Detection Labeling using Mixed-Strategy Active Learning and Pretrained Models. 1-7 - Alexandra Antonova:

Wiki-En-ASR-Adapt: Large-Scale Synthetic Dataset for English ASR Customization. 1-8 - Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe

:
Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond. 1-8 - Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie:

HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS. 1-7 - Anusha Prakash, Srinivasan Umesh, Hema A. Murthy:

Towards Developing State-of-The-Art TTS Synthesisers for 13 Indian Languages with Signal Processing Aided Alignments. 1-8 - Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie:

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. 1-8 - Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth Gurunath Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastrow, Ivan Bulyko:

Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition. 1-8 - Yuewei Zhang

, Huanbin Zou, Jie Zhu:
Vsanet: Real-Time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention. 1-8 - Chenglin Xu, Xiguang Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu:

KAQ: A Non-Intrusive Stacking Framework for Mean Opinion Score Prediction with Multi-Task Learning. 1-8 - Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin:

CTC Blank Triggered Dynamic Layer-Skipping for Efficient Ctc-Based Speech Recognition. 1-5 - Muhammad Umar Farooq

, Rehan Ahmad, Thomas Hain
:
MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition. 1-6 - Gan Song, Zelin Wu, Golan Pundak, Angad Chandorkar, Kandarp Joshi, Xavier Velez

, Diamantino Caseiro, Ben Haynor, Weiran Wang, Nikhil Siddhartha, Pat Rondon, Khe Chai Sim:
Contextual Spelling Correction with Large Language Models. 1-8 - Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie:

Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis. 1-8 - Elaf Islam, Thomas Hain

, Protima Nomo Sudro:
Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning. 1-6 - Jae-Hong Lee, Do-Hee Kim, Joon-Hyuk Chang:

AWMC: Online Test-Time Adaptation Without Mode Collapse for Continual Adaptation. 1-8 - Roshan S. Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Siddhant Arora, Shinji Watanabe

, Atsunori Ogawa, Marc Delcroix, Rita Singh, Bhiksha Raj:
Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems. 1-8 - Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli:

Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations. 1-8 - Daniel Galvez, Tim Kaldewey:

GPU-Accelerated Wfst Beam Search Decoder for CTC-Based Speech Recognition. 1-7 - Ke Hu, Tara N. Sainath, Bo Li, Yu Zhang, Yong Cheng, Tao Wang, Yujing Zhang, Frederick Liu:

Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text. 1-7 - Xingyu Cai, David Qiu, Shaojin Ding, Dongseong Hwang, Weiran Wang, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He:

Efficient Cascaded Streaming ASR System Via Frame Rate Reduction. 1-8 - Alexander Blatt, Badr M. Abdullah, Dietrich Klakow:

Ending the Blind Flight: Analyzing the Impact of Acoustic and Lexical Factors on WAV2VEC 2.0 in Air-Traffic Control. 1-8 - Jarod Duret, Benjamin O'Brien, Yannick Estève, Titouan Parcollet:

Enhancing Expressivity Transfer in Textless Speech-to-Speech Translation. 1-8 - Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe

, Yangyang Shi, Yumeng Tao:
TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch. 1-9 - Yiquan Zhou, Meng Chen, Yi Lei, Jihua Zhu, Weifeng Zhao:

VITS-Based Singing Voice Conversion System with DSPGAN Post-Processing for SVCC2023. 1-8 - Thomas Thebaud, Sonal Joshi, Henry Li, Martin Sustek, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak:

Clustering Unsupervised Representations as Defense Against Poisoning Attacks on Speech Commands Classification System. 1-8 - Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:

Using Joint Training Speaker Encoder With Consistency Loss to Achieve Cross-Lingual Voice Conversion and Expressive Voice Conversion. 1-8 - Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura:

After: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition. 1-8 - Yan Huang, Piyush Behre, Guoli Ye, Shawn Chang, Yifan Gong:

Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model. 1-6 - Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu:

PP-MET: A Real-World Personalized Prompt Based Meeting Transcription System. 1-8 - Pavel Denisov, Ngoc Thang Vu:

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding. 1-8 - Mun-Hak Lee, Sang-Eon Lee, Ji-Eun Choi, Joon-Hyuk Chang:

Cross-Modal Learning for CTC-Based ASR: Leveraging CTC-Bertscore and Sequence-Level Training. 1-8 - Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki:

Generalized Zero-Shot Audio-to-Intent Classification. 1-8 - Rajeev Rajan, Noumida Abdul Kareem

, Sreelakshmi S:
Paraconsistent Feature Analysis for the Competency Evaluation of Voice Impersonation. 1-7 - Huali Zhou

, Yueqian Lin
, Yao Shi, Peng Sun, Ming Li:
Bisinger: Bilingual Singing Voice Synthesis. 1-8 - Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu:

Few-Shot Spoken Language Understanding Via Joint Speech-Text Models. 1-8 - Jiajun He, Zekun Yang

, Tomoki Toda:
ED-CEC: Improving Rare word Recognition Using ASR Postprocessing Based on Error Detection and Context-Aware Error Correction. 1-6 - Guodong Ma

, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu:
LAE-ST-MOE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-Switching ASR. 1-8 - Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng:

Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking. 1-8 - Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li:

Haha-POD: An Attempt for Laughter-Based Non-Verbal Speaker Verification. 1-7 - Wei-Ping Huang, Sung-Feng Huang, Hung-Yi Lee:

Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization. 1-8 - Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg:

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models. 1-7 - Ji-Hwan Mo, Jae-Jin Jeon, Mun-Hak Lee, Joon-Hyuk Chang:

Knowledge Distillation From Offline to Streaming Transducer: Towards Accurate and Fast Streaming Model by Matching Alignments. 1-7 - Tanel Alumäe, Jiaming Kong, Daniil Robnikov:

Dialect Adaptation and Data Augmentation for Low-Resource ASR: Taltech Systems for the Madasr 2023 Challenge. 1-7 - William Ravenscroft

, Stefan Goetze
, Thomas Hain
:
On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments. 1-7 - Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie:

MBTFNET: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement. 1-8 - Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda:

The Singing Voice Conversion Challenge 2023. 1-8 - Chun-Yi Kuan, Chen-An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-Yiin Chang, Hung-Yi Lee:

Towards General-Purpose Text-Instruction-Guided Voice Conversion. 1-8 - Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke:

Generative Speech Recognition Error Correction With Large Language Models and Task-Activating Prompting. 1-8 - Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao:

Study on the Correlation Between Objective Evaluations and Subjective Speech Quality and Intelligibility. 1-7 - Sibo Tong, Philip Harding, Simon Wiesler:

Hierarchical Attention-Based Contextual Biasing For Personalized Speech Recognition Using Neural Transducers. 1-8 - Yifan Peng

, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo
, Muhammad Shakeel
, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe
:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. 1-8 - Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung:

Consistency Based Unsupervised Self-Training for ASR Personalisation. 1-8 - Jeremy Heng Meng Wong, Huayun Zhang, Nancy F. Chen

:
Variational Gaussian Process Data Uncertainty. 1-8 - Lahiru Samarakoon, Samuel J. Broughton, Marc Härkönen, Ivan Fung:

Transformer Attractors for Robust and Efficient End-To-End Neural Diarization. 1-8 - Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai:

Cross-Modal Alignment With Optimal Transport For CTC-Based ASR. 1-7 - Kailai Shen

, Diqun Yan, Li Dong, Ying Ren
, Xiaoxun Wu, Jing Hu:
SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction. 1-6 - Chang Chen, Xun Gong, Yanmin Qian:

Efficient Text-Only Domain Adaptation For CTC-Based ASR. 1-7 - Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney:

Investigating The Effect of Language Models in Sequence Discriminative Training For Neural Transducers. 1-8 - Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:

Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition. 1-8 - Dongning Yang, Wei Wang, Yanmin Qian:

FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT For Distortion-Invariant Robust Speech Recognition. 1-8 - David Qiu, Shaojin Ding, Yanzhang He:

The Role of Feature Correlation on Quantized Neural Networks. 1-7 - Shaoxiong Lin, Chao Zhang, Yanmin Qian:

Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning. 1-7 - Yoshiki Sato, Julián Villegas

:
Spectral Tilt May Have a Smaller Impact on the Intelligibility of Speech in Noise. 1-5 - Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe

:
Yodas: Youtube-Oriented Dataset for Audio and Speech. 1-8 - Wenqing Wei, Zhengdong Yang, Yuan Gao

, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li
:
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection. 1-6 - Hiroyoshi Yamasaki, Jérôme Louradour, Julie Hunter, Laurent Prévot:

Transcribing and Aligning Conversational Speech: A Hybrid Pipeline Applied to French Conversations. 1-6 - Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-Wen Li, Gargi Ghosh:

Flap: Fast Language-Audio Pre-Training. 1-8 - Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari:

COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control. 1-8 - Jiarui Hai, Yu-Jeh Liu, Mounya Elhilali:

Boosting Modality Representation With Pre-Trained Models and Multi-Task Training for Multimodal Sentiment Analysis. 1-8 - Armand Stricker, Patrick Paroubek:

Enhancing Task-Oriented Dialogues With Chitchat: A Comparative Study Based on Lexical Diversity And Divergence. 1-8 - Seongjin Park, Rutuja Ubale:

Multitask Learning Model with Text and Speech Representation for Fine-Grained Speech Scoring. 1-7 - Martin Sustek, Sonal Joshi, Henry Li, Thomas Thebaud, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak:

Joint Energy-Based Model for Robust Speech Classification System Against Dirty-Label Backdoor Poisoning Attacks. 1-8 - Yuya Fujita, Shinji Watanabe

, Xuankai Chang, Takashi Maekaku:
LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models. 1-6 - Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston H. Hsu, Hung-Yi Lee:

Minisuperb: Lightweight Benchmark for Self-Supervised Speech Models. 1-8 - Tzu-Quan Lin, Hung-Yi Lee, Hao Tang:

MelHuBERT: A Simplified Hubert on Mel Spectrograms. 1-8 - Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg:

Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition. 1-8 - Xintong Wang, Chang Zeng

, Jun Chen, Chunhui Wang:
Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers. 1-6 - Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul K. Rubenstein, Lukas Zilka, Dian Yu, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu:

SLM: Bridge the Thin Gap Between Speech and Text Foundation Models. 1-8 - Ilja Baumann, Dominik Wagner, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet

:
Detection of Vowel Errors in Children's Speech using Synthetic Phonetic Transcripts. 1-8 - Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu:

On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration. 1-8 - Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung:

Locality Enhanced Dynamic Biasing and Sampling Strategies For Contextual ASR. 1-8 - Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son

, Tomoki Toda:
Improving Severity Preservation of Healthy-to-Pathological Voice Conversion With Global Style Tokens. 1-7 - Junchen Liu, Jesin James, Karan Nathwani:

Improved Multi-Modal Emotion Recognition Using Squeeze-and-Excitation Block in Cross-Modal Attention. 1-8 - Jun-Kun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li:

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach. 1-7 - Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet

:
Speaker Adaptation for End-to-End Speech Recognition Systems in Noisy Environments. 1-6 - Jun-You Wang, Hung-Yi Lee, Jyh-Shing Roger Jang, Li Su:

Zero-Shot Singing Voice Synthesis from Musical Score. 1-8 - Ashutosh Chaubey

, Sparsh Sinha, Susmita Ghose:
Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition. 1-8 - Jerome R. Bellegarda:

Pareto Efficiency of Learning-Forgetting Trade-Off in Neural Language Model Adaptation. 1-8 - Daichi Hayakawa, Takehiko Kagoshima, Kenji Iwata, Norbert Braunschweiler, Rama Doddipatla:

Robust Recognition of Speaker Emotion With Difference Feature Extraction Using a Few Enrollment Utterances. 1-7 - Sara Papi, Peidong Wang, Jun-Kun Chen, Jian Xue, Jinyu Li, Yashesh Gaur:

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments. 1-8 - Jin Qiu, Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma:

Improving Large-Scale Deep Biasing With Phoneme Features and Text-Only Data in Streaming Transducer. 1-8 - Zitha Sasindran

, Harsha Yelchuri, T. V. Prabhakar, Supreeth Rao:
HEVAL: A New Hybrid Evaluation Metric for Automatic Speech Recognition Tasks. 1-7 - Marvin Lavechin, Marianne Métais, Hadrien Titeux, Alodie Boissonnet, Jade Copet, Morgane Rivière, Elika Bergelson, Alejandrina Cristià, Emmanuel Dupoux, Hervé Bredin:

Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation. 1-7 - Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie:

Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation. 1-8 - Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu:

Acoustic Model Fusion For End-to-End Speech Recognition. 1-7 - Yusuke Shinohara, Shinji Watanabe

:
Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition. 1-7 - Pasquale D'Alterio, Christian Hensel, Bashar Awwad Shiekh Hasan:

Can Unpaired Textual Data Replace Synthetic Speech in ASR Model Adaptation? 1-8 - Daniela A. Wiepert

, Rene L. Utianski, Joseph R. Duffy, John L. Stricker, Leland Barnard, Keith A. Josephs, Jennifer L. Whitwell, David T. Jones, Hugo Botha:
Not All Errors Are Created Equal: Evaluating The Impact of Model and Speaker Factors on ASR Outcomes in Clinical Populations. 1-6 - Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe

, Tetsunori Kobayashi, Tetsuji Ogawa
:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction. 1-6 - Varun Krishna, Sriram Ganapathy:

Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations. 1-8 - Yuewei Zhang

, Huanbin Zou, Jie Zhu:
Magnitude-and-Phase-Aware Speech Enhancement With Parallel Sequence Modeling. 1-8 - Jen-Tzung Chien

, Wei-Yu Sun:
Adversarial Augmentation For Adapter Learning. 1-7 - Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu:

Neuralecho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network For Acoustic Echo Cancellation and Speech Enhancement. 1-8 - Can Cui, Imran A. Sheikh

, Mostafa Sadeghi, Emmanuel Vincent:
End-to-End Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis. 1-8 - Sakriani Sakti, Benita Angela Titalim:

Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task. 1-8 - Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul Kuo-Ming Huang, Chien-Yu Huang, Shang-Wen Li, Hung-Yi Lee:

Prompting and Adapter Tuning For Self-Supervised Encoder-Decoder Speech Model. 1-8 - Yuang Li, Yu Wu, Jinyu Li, Shujie Liu:

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition. 1-8 - Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe

, Yanmin Qian:
Toward Universal Speech Enhancement For Diverse Input Conditions. 1-6 - Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi:

Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus. 1-7 - William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng

, Xuankai Chang, Soumi Maiti, Shinji Watanabe
:
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. 1-8 - Guru Prakash Arumugam, Shuo-Yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia:

Improved Long-Form Speech Recognition By Jointly Modeling The Primary And Non-Primary Speakers. 1-8 - Wonjun Lee, Gary Geunbae Lee, Yunsu Kim:

Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition And Phoneme To Grapheme Translation. 1-8 - Junteng Jia, Ke Li, Mani Malek, Kshitiz Malik, Jay Mahadeokar, Ozlem Kalinli, Frank Seide:

Joint Federated Learning and Personalization for on-Device ASR. 1-8 - Vanitha Devi R

, Vasundhara:
Robust Logarithmic Champernowne Algorithm for Feedback Cancellation in Hearing aids. 1-5 - Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie:

Vits-Based Singing Voice Conversion Leveraging Whisper and Multi-Scale F0 Modeling. 1-8 - Sangeet Sagar, Mirco Ravanelli, Bernd Kiefer, Ivana Kruijff-Korbayová, Josef van Genabith:

Rescuespeech: A German Corpus for Speech Recognition in Search and Rescue Domain. 1-7 - Yuan Gong

, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James R. Glass:
Joint Audio and Speech Understanding. 1-8 - Anirudh Raju, Aparna Khare

, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow:
Two-Pass Endpoint Detection for Speech Recognition. 1-8 - Hillary Ngai, Rohan Agrawal, Neeraj Gaur, W. Ronny Huang, Parisa Haghani, Pedro Moreno Mengibar:

Audio-Adapterfusion: A Task-Id-Free Approach for Efficient and Non-Destructive Multi-Task Speech Recognition. 1-8 - Robin Netzorg, Ajil Jalal, Luna McNulty, Gopala Krishna Anumanchipalli:

Permod: Perceptually Grounded Voice Modification With Latent Diffusion Models. 1-8 - Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang:

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition. 1-6 - Amit Meghanani, Thomas Hain

:
Deriving Translational Acoustic Sub-Word Embeddings. 1-8 - Yusheng Tian, Wei Liu, Tan Lee

:
Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data. 1-7 - Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney:

End-To-End Training of a Neural HMM with Label and Transition Probabilities. 1-8 - Jenthe Thienpondt

, Kris Demuynck:
ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings. 1-8 - Gil Keren:

A Token-Wise Beam Search Algorithm for RNN-T. 1-8 - Wangyou Zhang, Lei Yang, Yanmin Qian:

Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing. 1-6 - Abinay Reddy Naini, Shruthi Subramanium, Seong-Gyun Leem, Carlos Busso:

Combining Relative and Absolute Learning Formulations to Predict Emotional Attributes From Speech. 1-8 - Yanmei Gu, Jing Li, Jiayi Zhou, Zhiming Wang, Huijia Zhu:

Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification. 1-8 - Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie:

U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias. 1-8 - Yuanyuan Zhang

, Aaricia Herygers
, Tanvina Patel, Zhengjun Yue, Odette Scharenborg
:
Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech. 1-8 - Gene-Ping Yang, Hao Tang:

Towards Matching Phones and Speech Representations. 1-8 - Abderrahim Fathan, Jahangir Alam:

CAMSAT: Augmentation Mix and Self-Augmented Training Clustering for Self-Supervised Speaker Recognition. 1-8 - Lakshmi Rajendram Bashyam

, Alexander Blatt, Dietrich Klakow:
Enabling Noisy Label Usage for Out-of-Airspace Data in Read-Back Error Detection. 1-8 - Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong:

Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training. 1-7 - Jihyun Lee

, Yejin Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee:
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking. 1-8 - Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie:

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR. 1-7 - Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi:

The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains. 1-7 - Jason Clarke, Yoshihiko Gotoh, Stefan Goetze

:
Improving Audiovisual Active Speaker Detection in Egocentric Recordings with the Data-Efficient Image Transformer. 1-8 - Takuma Okamoto, Haruki Yamashita, Yamato Ohtani, Tomoki Toda, Hisashi Kawai:

WaveNeXt: ConvNeXt-Based Fast Neural Vocoder Without ISTFT layer. 1-8 - Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen:

E3 TTS: Easy End-to-End Diffusion-Based Text To Speech. 1-8 - Dennis Fucci, Marco Gaido

, Matteo Negri, Mauro Cettolo, Luisa Bentivogli:
No Pitch Left Behind: Addressing Gender Unbalance In Automatic Speech Recognition Through Pitch Manipulation. 1-8 - Mohan Li, Catalin Zorila, Cong-Thanh Do, Rama Doddipatla:

Towards a Unified End-to-End Language Understanding System for Speech and Text Inputs. 1-8 - Geoffroy Vanderreydt, Amrutha Prasad

, Driss Khalil, Srikanth R. Madikeri, Kris Demuynck, Petr Motlícek:
Parameter-Efficient Tuning with Adaptive Bottlenecks for Automatic Speech Recognition. 1-7 - Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie:

An Exploration of Task-Decoupling on Two-Stage Neural Post Filter for Real-Time Personalized Acoustic Echo Cancellation. 1-7 - Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli:

Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection. 1-8 - Yongmao Zhang, Guanghou Liu, Yi Lei, Yunlin Chen, Hao Yin, Lei Xie, Zhifei Li:

Promptspeaker: Speaker Generation Based on Text Descriptions. 1-7 - Nicholas Sanders, Korin Richmond:

Invert-Classify: Recovering Discrete Prosody Inputs for Text-To-Speech. 1-7 - Hao Zhang, Meng Yu, Dong Yu:

Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings. 1-7 - Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews:

The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections Through Federated Learning. 1-7 - Peng Shen, Xuguang Lu, Hisashi Kawai:

Generative Linguistic Representation for Spoken Language Identification. 1-8 - Bi-Cheng Yan, Hsin-Wei Wang, Yi-Cheng Wang, Jiun-Ting Li, Chi-Han Lin, Berlin Chen:

Preserving Phonemic Distinctions For Ordinal Regression: A Novel Loss Function For Automatic Pronunciation Assessment. 1-7 - Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen:

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning. 1-7 - Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:

A Comparative Study of Voice Conversion Models With Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023. 1-6 - Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang:

Neuralkalman: A Learnable Kalman Filter for Acoustic Echo Cancellation. 1-7 - Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, Shinji Watanabe

:
Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation. 1-8 - Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter:

On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition. 1-8 - Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth:

MASR: Multi-Label Aware Speech Representation. 1-8 - Yayun He, Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao:

Voiceextender: Short-Utterance Text-Independent Speaker Verification With Guided Diffusion Model. 1-8 - Ziyun Cui, Wen Wu

, Wei-Qiang Zhang, Ji Wu, Chao Zhang:
Transferring Speech-Generic and Depression-Specific Knowledge for Alzheimer's Disease Detection. 1-8 - Quentin Meeus

, Marie-Francine Moens, Hugo Van hamme
:
Whisper-Slu: Extending a Pretrained Speech-to-Text Transformer for Low Resource Spoken Language Understanding. 1-6 - Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko:

Discriminative Speech Recognition Rescoring With Pre-Trained Language Models. 1-7 - Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li

, Hao Wu, Jian Lu, Xinkang Xu:
LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement. 1-6 - Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela A. Wiepert

, David T. Jones, Hugo Botha:
Detecting Speech Abnormalities With a Perceiver-Based Sequence Classifier that Leverages a Universal Speech Model. 1-7 - Yingzhi Wang, Mirco Ravanelli, Alya Yacoubi:

Speech Emotion Diarization: Which Emotion Appears When? 1-7 - Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Dongjune Lee, Nam Soo Kim:

Transduce and Speak: Neural Transducer for Text-To-Speech with Semantic Token Prediction. 1-7 - Artit Suwanbandit, Jaturong Chitiyaphol, Sutthinan Chuenchom, Kanyarat Kwiecien, Husen Sawal, Ruslan Uthai, Orathai Sangpetch, Ekapol Chuangsuwanich

:
Thai-Dialect: Low Resource Thai Dialectal Speech to Text Corpora. 1-8 - Chi-Chang Lee, Hong-Wei Chen, Chu-Song Chen, Hsin-Min Wang, Tsung-Te Liu, Yu Tsao:

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models. 1-8 - Sebastião Quintas, Mathieu Balaguer, Julie Mauclair, Virginie Woisard, Julien Pinquier:

Can We Use Speaker Embeddings On Spontaneous Speech Obtained From Medical Conversations To Predict Intelligibility? 1-7 - Zhengyang Li, Thomas Graave

, Jing Liu, Timo Lohrenz, Siegfried Kunzmann, Tim Fingscheidt:
Parameter-Efficient Cross-Language Transfer Learning for a Language-Modular Audiovisual Speech Recognition. 1-8 - Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe

:
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference. 1-8 - Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah

:
Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting. 1-7 - Peikun Chen, Fan Yu, Yuhao Liang, Hongfei Xue, Xucheng Wan, Naijun Zheng

, Huan Zhou, Lei Xie:
BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition. 1-7 - Jian Xue, Peidong Wang, Jinyu Li, Eric Sun:

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability. 1-7 - Jun-You Wang, Chon-In Leong, Yu-Chen Lin

, Li Su, Jyh-Shing Roger Jang:
Adapting Pretrained Speech Model for Mandarin Lyrics Transcription and Alignment. 1-8 - Zhaofeng Lin

, Tanvina Patel, Odette Scharenborg
:
Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation. 1-8 - Jeong-Hwan Choi

, Jehyun Kyung, Ju-Seok Seong, Ye-Rin Jeoung, Joon-Hyuk Chang:
Extending Self-Distilled Self-Supervised Learning For Semi-Supervised Speaker Verification. 1-8 - Yosuke Higuchi, Andrew Rosenberg, Yuan Wang, Murali Karthick Baskar, Bhuvana Ramabhadran:

Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder. 1-8 - Maliha Jahan

, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak, Jesús Villalba:
Model-Based Fairness Metric for Speaker Verification. 1-7

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














