default search action

combined dblp search
author search
venue search
publication search

ask others

Yifan Peng 0003

> Home > Persons

Person information

affiliation: NVIDIA Corporation, Santa Clara, CA, USA
affiliation (PhD 2025): Carnegie Mellon University, Department of Electrical and Computer Engineering, Pittsburgh, PA, USA

Other persons with the same name

see FAQ

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2025
[c39]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/aaai/WuLPWS025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/aaai/WuLPWS025
Yihan Wu, Yichen Lu, Yifan Peng, Xihua Wang, Ruihua Song, Shinji Watanabe:
Enhancing Audiovisual Speech Recognition Through Bifocal Preference Optimization. AAAI 2025: 25516-25524
[c38]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/SomekiPAMMSL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/SomekiPAMMSL025
Masao Someki, Yifan Peng, Siddhant Arora, Markus Müller, Athanasios Mouchtaris, Grant P. Strimel, Jing Liu, Shinji Watanabe:
Context-aware Dynamic Pruning for Speech Foundation Models. ICLR 2025
[c37]
- view
  authority control:
- export record
  dblp key:
  - conf/naacl/PengPCZHDHWBG25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/naacl/PengPCZHDHWBG25
Yifan Peng, Krishna C. Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, Boris Ginsburg:
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning. NAACL (Long Papers) 2025: 5787-5802
[i45]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2502-10373
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2502-10373
William Chen, Jinchuan Tian, Yifan Peng, Brian Yan, Chao-Han Huck Yang, Shinji Watanabe:
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models. CoRR abs/2502.10373 (2025)
[i44]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2502-15218
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2502-15218
Jinchuan Tian, Jiatong Shi, William Chen, Siddhant Arora, Yoshiki Masuyama, Takashi Maekaku, Yihan Wu, Junyi Peng, Shikhar Bharadwaj, Yiwen Zhao, Samuele Cornell, Yifan Peng, Xiang Yue, Chao-Han Huck Yang, Graham Neubig, Shinji Watanabe:
ESPnet-SpeechLM: An Open Speech Language Model Toolkit. CoRR abs/2502.15218 (2025)
[i43]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2503-08533
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2503-08533
Siddhant Arora, Yifan Peng, Jiatong Shi, Jinchuan Tian, William Chen, Shikhar Bharadwaj, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Shuichiro Shimizu, Vaibhav Srivastav, Shinji Watanabe:
ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems. CoRR abs/2503.08533 (2025)
[i42]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2504-08528
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2504-08528
Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Emmanuel Dupoux, Hung-Yi Lee, Karen Livescu, Shinji Watanabe:
On The Landscape of Spoken Language Models: A Comprehensive Survey. CoRR abs/2504.08528 (2025)
[i41]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2505-13404
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2505-13404
Nithin Rao Koluguri, Monica Sekoyan, George Zelenfroynd, Sasha Meister, Shuoyang Ding, Sofia Kostandian, He Huang, Nikolay Karpov, Jagadeesh Balam, Vitaly Lavrukhin, Yifan Peng, Sara Papi, Marco Gaido, Alessio Brutti, Boris Ginsburg:
Granary: Speech Recognition and Translation Dataset in 25 European Languages. CoRR abs/2505.13404 (2025)
[i40]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2505-24200
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2505-24200
Qingzheng Wang, Jiancheng Sun, Yifan Peng, Shinji Watanabe:
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC. CoRR abs/2505.24200 (2025)
[i39]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2506-00338
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2506-00338
Yifan Peng, Muhammad Shakeel, Yui Sudo, William Chen, Jinchuan Tian, Chyi-Jiunn Lin, Shinji Watanabe:
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning. CoRR abs/2506.00338 (2025)
[i38]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2506-00422
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2506-00422
Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Chyi-Jiunn Lin, Shinji Watanabe:
DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition. CoRR abs/2506.00422 (2025)
[i37]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2506-17611
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2506-17611
Jinchuan Tian, William Chen, Yifan Peng, Jiatong Shi, Siddhant Arora, Shikhar Bharadwaj, Takashi Maekaku, Yusuke Shinohara, Keita Goto, Xiang Yue, Huck Yang, Shinji Watanabe:
OpusLM: A Family of Open Unified Speech Language Models. CoRR abs/2506.17611 (2025)
[i36]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2508-20474
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2508-20474
Muhammad Shakeel, Yui Sudo, Yifan Peng, Chyi-Jiunn Lin, Shinji Watanabe:
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder. CoRR abs/2508.20474 (2025)
2024
[c36]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/PengS0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/PengS0024
Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification. ACL (1) 2024: 10192-10209
[c35]
- view
  authority control:
- export record
  dblp key:
  - conf/emnlp/ChenZPLTSCML024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/emnlp/ChenZPLTSCML024
William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. EMNLP 2024: 10205-10224
[c34]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ShakeelSPW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ShakeelSPW24
Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation. ICASSP Workshops 2024: 570-574
[c33]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/Sudo0FP024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/Sudo0FP024
Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search. ICASSP 2024: 10896-10900
[c32]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/HuangLWHKWACSPS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/HuangLWHKWACSPS24
Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee:
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech. ICASSP 2024: 12136-12140
[c31]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MaitiPCJC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MaitiPCJC024
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe:
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks. ICASSP 2024: 13326-13330
[c30]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001SP024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001SP024
Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss. INTERSPEECH 2024
[c29]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengTCAYS0CSCJ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengTCAYS0CSCJ024
Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe:
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer. INTERSPEECH 2024
[c28]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PrabhuPJ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrabhuPJ024
Darshan Prabhu, Yifan Peng, Preethi Jyothi, Shinji Watanabe:
MULTI-CONVFORMER: Extending Conformer with Multiple Convolution Kernels. INTERSPEECH 2024
[c27]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianPCCL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianPCCL024
Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe:
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models. INTERSPEECH 2024
[c26]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/naacl/AroraFJPSKTL024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/naacl/AroraFJPSKTL024
Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe:
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions. NAACL-HLT 2024: 2754-2774
[c25]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/WuPLCSW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/WuPLCSW24
Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe:
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. SLT 2024: 43-48
[c24]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/SudoFSPW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/SudoFSPW24
Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition With Dynamic Vocabulary. SLT 2024: 78-85
[c23]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/SomekiCACCHPSSW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/SomekiCACCHPSSW24
Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe:
ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration. SLT 2024: 863-870
[i35]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-10449
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-10449
Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search. CoRR abs/2401.10449 (2024)
[i34]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-16658
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-16658
Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe:
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer. CoRR abs/2401.16658 (2024)
[i33]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-18045
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-18045
Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song:
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition. CoRR abs/2401.18045 (2024)
[i32]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-12654
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-12654
Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification. CoRR abs/2402.12654 (2024)
[i31]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2403-12402
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2403-12402
Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong:
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis. CoRR abs/2403.12402 (2024)
[i30]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2403-12408
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2403-12408
Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong:
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation. CoRR abs/2403.12408 (2024)
[i29]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2405-13344
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2405-13344
Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition with Dynamic Vocabulary. CoRR abs/2405.13344 (2024)
[i28]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2405-13514
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2405-13514
Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation. CoRR abs/2405.13514 (2024)
[i27]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-02950
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-02950
Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe:
4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders. CoRR abs/2406.02950 (2024)
[i26]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-09282
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-09282
Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe:
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models. CoRR abs/2406.09282 (2024)
[i25]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-16120
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-16120
Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss. CoRR abs/2406.16120 (2024)
[i24]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-00837
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-00837
William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. CoRR abs/2407.00837 (2024)
[i23]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-03718
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-03718
Darshan Prabhu, Yifan Peng, Preethi Jyothi, Shinji Watanabe:
Multi-Convformer: Extending Conformer with Multiple Convolution Kernels. CoRR abs/2407.03718 (2024)
[i22]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-09506
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-09506
Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe:
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration. CoRR abs/2409.09506 (2024)
[i21]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-12370
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-12370
Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe:
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. CoRR abs/2409.12370 (2024)
[i20]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-17485
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-17485
Yifan Peng, Krishna C. Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, Boris Ginsburg:
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning. CoRR abs/2410.17485 (2024)
[i19]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2412-19005
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2412-19005
Yihan Wu, Yichen Lu, Yifan Peng, Xihua Wang, Ruihua Song, Shinji Watanabe:
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization. CoRR abs/2412.19005 (2024)
2023
[c22]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/YanS0IPDPFBHZNH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/YanS0IPDPFBHZNH23
Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe:
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit. ACL (demo) 2023: 400-411
[c21]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/ChenSYBZPCMW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/ChenSYBZPCMW23
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. ASRU 2023: 1-8
[c20]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/PengTYBCLSACSZSSJMW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/PengTYBCLSACSZSSJMW23
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. ASRU 2023: 1-8
[c19]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/AroraFWHPKTYW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/AroraFWHPKTYW23
Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge. ICASSP 2023: 1-2
[c18]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ChenYSPMW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ChenYSPMW23
William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR with Auxiliary CTC Objectives. ICASSP 2023: 1-5
[c17]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/FutamiHAWKPYTW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/FutamiHAWKPYTW23
Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge. ICASSP 2023: 1-2
[c16]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KashiwagiAFHWPYTW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KashiwagiAFHWPYTW23
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge. ICASSP 2023: 1-2
[c15]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MaitiPSW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MaitiPSW23
Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe:
Speechlmscore: Evaluating Speech Generation Using Speech Language Model. ICASSP 2023: 1-5
[c14]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/PengKWSW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/PengKWSW23
Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe:
Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding. ICASSP 2023: 1-5
[c13]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/PengLW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/PengLW23
Yifan Peng, Jaesong Lee, Shinji Watanabe:
I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition. ICASSP 2023: 1-5
[c12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengS0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengS0023
Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. INTERSPEECH 2023: 62-66
[c11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KashiwagiAFHWPY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KashiwagiAFHWPY23
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
Tensor decomposition for minimization of E2E SLU model toward on-device processing. INTERSPEECH 2023: 710-714
[c10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengKWYACTSS023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengKWYACTSS023
Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. INTERSPEECH 2023: 2208-2212
[c9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenCPNM023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenCPNM023
William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. INTERSPEECH 2023: 4404-4408
[c8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Sudo0P023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Sudo0P023
Yui Sudo, Muhammad Shakeel, Yifan Peng, Shinji Watanabe:
Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training. INTERSPEECH 2023: 4479-4483
[c7]
- view
  authority control:
- export record
  dblp key:
  - conf/iwslt/YanSMCLPA023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iwslt/YanSMCLPA023
Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, Shinji Watanabe:
CMU's IWSLT 2023 Simultaneous Speech Translation System. IWSLT@ACL 2023: 235-240
[i18]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2302-12829
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2302-12829
William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR With Auxiliary CTC Objectives. CoRR abs/2302.12829 (2023)
[i17]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2302-14132
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2302-14132
Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe:
Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding. CoRR abs/2302.14132 (2023)
[i16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2303-07624
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2303-07624
Yifan Peng, Jaesong Lee, Shinji Watanabe:
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition. CoRR abs/2303.07624 (2023)
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-01194
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-01194
Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge. CoRR abs/2305.01194 (2023)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-01620
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-01620
Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge. CoRR abs/2305.01620 (2023)
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-11073
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-11073
Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. CoRR abs/2305.11073 (2023)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-17651
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-17651
Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. CoRR abs/2305.17651 (2023)
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-06672
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-06672
William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. CoRR abs/2306.06672 (2023)
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-07937
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-07937
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe:
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks. CoRR abs/2309.07937 (2023)
[i9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-09510
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-09510
Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee:
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech. CoRR abs/2309.09510 (2023)
[i8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-13876
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-13876
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data. CoRR abs/2309.13876 (2023)
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-15317
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-15317
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning. CoRR abs/2309.15317 (2023)
[i6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-02973
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-02973
Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network. CoRR abs/2310.02973 (2023)
2022
[c6]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/AroraDDCUPZKGYV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/AroraDDCUPZKGYV22
Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe:
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet. ICASSP 2022: 7167-7171
[c5]
- view
  - electronic edition @ mlr.press (open access)
  - details & citations
- export record
  dblp key:
  - conf/icml/PengDL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/PengDL022
Yifan Peng, Siddharth Dalmia, Ian R. Lane, Shinji Watanabe:
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding. ICML 2022: 17627-17643
[c4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaekakuFP022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaekakuFP022
Takashi Maekaku, Yuya Fujita, Yifan Peng, Shinji Watanabe:
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR. INTERSPEECH 2022: 1071-1075
[c3]
- view
  authority control:
- export record
  dblp key:
  - conf/iwslt/YanFDSPBWNW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iwslt/YanFDSPBWNW22
Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe:
CMU's IWSLT 2022 Dialect Speech Translation System. IWSLT@ACL 2022: 298-307
[c2]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/KimWPPSHW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/KimWPPSHW22
Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe:
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition. SLT 2022: 84-91
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/PengAHUKGDCW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/PengAHUKGDCW22
Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe:
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. SLT 2022: 406-413
[i5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2207-02971
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2207-02971
Yifan Peng, Siddharth Dalmia, Ian R. Lane, Shinji Watanabe:
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding. CoRR abs/2207.02971 (2022)
[i4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-00077
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-00077
Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe:
E-Branchformer: Branchformer with Enhanced merging for speech recognition. CoRR abs/2210.00077 (2022)
[i3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-05869
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-05869
Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe:
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. CoRR abs/2211.05869 (2022)
[i2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2212-04559
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2212-04559
Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe:
SpeechLMScore: Evaluating speech generation using speech language model. CoRR abs/2212.04559 (2022)
2021
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2111-14706
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2111-14706
Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe:
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet. CoRR abs/2111.14706 (2021)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.