


default search action
Yifan Yang 0005
Person information
- affiliation: Xiaomi Corp., Beijing, China
- affiliation: Shanghai Jiao Tong University, X-LANCE Lab, AI Institute, MoE Key Lab of Artificial Intelligence, Shanghai, China
Other persons with the same name
- Yifan Yang (aka: Yi-Fan Yang) — disambiguation page
- Yifan Yang 0001
— Transwarp Technology (Shanghai) Co., Ltd, China (and 1 more) - Yifan Yang 0002
— Victoria University of Wellington, School of Engineering and Computer Science, Centre for Data Science and Artificial Intelligence, Wellington, New Zealand - Yifan Yang 0003
— Beihang University, Institute of Artificial Intelligence, Beijing, China - Yifan Yang 0004
— Microsoft Research Asia, Shanghai, China (and 1 more) - Yifan Yang 0006 — National Institutes of Health, National Library of Medicine, National Center for Biotechnology Information, Bethesda, MD, USA (and 1 more)
- Yifan Yang 0007 — Samsung R&D Institute China-Beijing, China
- Yifan Yang 0008
— Tencent AI Lab, Shenzhen, China - Yifan Yang 0009
— Tsinghua University, Beijing, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
Journal Articles
- 2025
[j1]Hui Wang
, Yifan Yang
, Shujie Liu, Jinyu Li
, Lingwei Meng, Tie-Yan Liu, Jiaming Zhou
, Haoqin Sun, Yan Lu, Yong Qin:
StreamMel: Real-Time Zero-Shot Text-to-Speech Via Interleaved Continuous Autoregressive Modeling. IEEE Signal Process. Lett. 32: 3530-3534 (2025)
Conference and Workshop Papers
- 2025
[c22]Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen:
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration. AAAI 2025: 24840-24848
[c21]Wenxi Chen, Ziyang Ma, Ruiqi Yan, Yuzhe Liang, Xiquan Li, Ruiyang Xu, Zhikang Niu, Yanqiao Zhu, Yifan Yang, Zhanxun Liu, Kai Yu, Yuxuan Hu, Jinyu Li, Yan Lu, Shujie Liu, Xie Chen:
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training. ACL (Findings) 2025: 2262-2282
[c20]Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen:
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement. ACL (1) 2025: 2673-2686
[c19]Yexing Du, Youcheng Pan, Ziyang Ma, Bo Yang, Yifan Yang, Keqi Deng, Xie Chen, Yang Xiang, Ming Liu, Bing Qin:
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning. ACL (1) 2025: 12466-12478
[c18]Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen, Kai Yu:
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech. ICASSP 2025: 1-5
[c17]Yifan Yang, Jianheng Zhuo, Zengrui Jin, Ziyang Ma, Xiaoyu Yang, Zengwei Yao, Liyong Guo, Wei Kang, Fangjun Kuang, Long Lin, Daniel Povey, Xie Chen:
k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning. ICME 2025: 1-6- 2024
[c16]Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. ICASSP 2024: 10401-10405
[c15]Yifan Yang, Yice Zhang, Ruifeng Xu:
Enhancing Generative Aspect-Based Sentiment Analysis with Relation-Level Supervision and Prompt. ICASSP 2024: 10526-10530
[c14]Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey:
PromptASR for Contextualized ASR with Controllable Style. ICASSP 2024: 10536-10540
[c13]Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey:
Libriheavy: A 50, 000 Hours ASR Corpus with Punctuation Casing and Context. ICASSP 2024: 10991-10995
[c12]Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey:
Zipformer: A faster and better encoder for automatic speech recognition. ICLR 2024
[c11]Zengrui Jin, Yifan Yang, Mohan Shi, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey:
LibriheavyMix: A 20, 000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization. INTERSPEECH 2024
[c10]Zheshu Song, Jianheng Zhuo, Yifan Yang, Ziyang Ma, Shixiong Zhang, Xie Chen:
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR. INTERSPEECH 2024
[c9]Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen:
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer. INTERSPEECH 2024
[c8]Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Xu Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:
The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge. ISCSLP 2024: 641-645- 2023
[c7]Yice Zhang, Yifan Yang, Bin Liang, Shiwei Chen, Bing Qin
, Ruifeng Xu:
An Empirical Study of Sentiment-Enhanced Pre-Training for Aspect-Based Sentiment Analysis. ACL (Findings) 2023: 9633-9651
[c6]Yihui Li, Yice Zhang, Yifan Yang, Ruifeng Xu:
A Generative Model for Structured Sentiment Analysis. AIMS 2023: 28-38
[c5]Yice Zhang, Yifan Yang, Meng Li, Bin Liang, Shiwei Chen, Ruifeng Xu:
Target-to-Source Augmentation for Aspect Sentiment Triplet Extraction. EMNLP 2023: 12165-12177
[c4]Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey:
Delay-penalized CTC Implemented Based on Finite State Transducer. INTERSPEECH 2023: 1329-1333
[c3]Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey:
Blank-regularized CTC for Frame Skipping in Neural Transducer. INTERSPEECH 2023: 4409-4413- 2022
[c2]Yice Zhang, Yifan Yang, Yihui Li, Bin Liang, Shiwei Chen, Yixue Dang, Min Yang, Ruifeng Xu:
Boundary-Driven Table-Filling for Aspect Sentiment Triplet Extraction. EMNLP 2022: 6485-6498
[c1]Yihui Li, Yifan Yang, Yice Zhang, Ruifeng Xu:
HITSZ-HLT at SemEval-2022 Task 10: A Span-Relation Extraction Framework for Structured Sentiment Analysis. SemEval@NAACL 2022: 1406-1411
Informal and Other Publications
- 2025
[i27]Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang, Shiwan Zhao, Haiyang Sun, Yanqing Liu, Haoqin Sun, Jiaming Zhou, Yan Lu, Yong Qin:
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching. CoRR abs/2502.11128 (2025)
[i26]Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, Dong Chen, Dongdong Chen, Jun-Kun Chen, Weizhu Chen, Yen-Chun Chen, Yi-ling Chen, Qi Dai, Xiyang Dai, Ruchao Fan, Mei Gao, Min Gao, Amit Garg, Abhishek Goswami, Junheng Hao, Amr Hendy, Yuxuan Hu, Xin Jin, Mahmoud Khademi, Dongwoo Kim, Young Jin Kim, Gina Lee, Jinyu Li, Yunsheng Li, Chen Liang, Xihui Lin, Zeqi Lin, Mengchen Liu, Yang Liu, Gilsinia Lopez, Chong Luo, Piyush Madan, Vadim Mazalov, Arindam Mitra, Ali Mousavi, Anh Nguyen, Jing Pan, Daniel Perez-Becker, Jacob Platin, Thomas Portet, Kai Qiu, Bo Ren, Liliang Ren, Sambuddha Roy, Ning Shang, Yelong Shen, Saksham Singhal, Subhojit Som, Xia Song, Tetyana Sych, Praneetha Vaddamanu, Shuohang Wang, Yiming Wang, Zhenghao Wang, Haibin Wu, Haoran Xu, Weijian Xu, Yifan Yang, Ziyi Yang, Donghan Yu, Ishmam Zabir, Jianwen Zhang, Li Lyna Zhang, Yunan Zhang, Xiren Zhou:
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs. CoRR abs/2503.01743 (2025)
[i25]Yifan Yang, Shujie Liu, Jinyu Li, Yuxuan Hu, Haibin Wu, Hui Wang, Jianwei Yu, Lingwei Meng, Haiyang Sun, Yanqing Liu, Yan Lu, Kai Yu, Xie Chen:
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis. CoRR abs/2504.10352 (2025)
[i24]Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen:
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting. CoRR abs/2504.12867 (2025)
[i23]Haiyang Sun, Shujie Hu, Shujie Liu, Lingwei Meng, Hui Wang, Bing Han, Yifan Yang, Yanqing Liu, Sheng Zhao, Yan Lu, Yanmin Qian:
Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling. CoRR abs/2505.19669 (2025)
[i22]Jianheng Zhuo, Yifan Yang, Yiwen Shao, Yong Xu, Dong Yu, Kai Yu, Xie Chen:
VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining. CoRR abs/2505.21527 (2025)
[i21]Yu Zhang, Yunqi Li, Yifan Yang, Rui Wang, Yuqing Yang, Dai Qi, Jianmin Bao, Dongdong Chen, Chong Luo, Lili Qiu:
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL. CoRR abs/2505.24875 (2025)
[i20]Hui Wang, Yifan Yang, Shujie Liu, Jinyu Li, Lingwei Meng, Yanqing Liu, Jiaming Zhou, Haoqin Sun, Yan Lu, Yong Qin:
StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling. CoRR abs/2506.12570 (2025)
[i19]Miaosen Zhang, Ziqiang Xu, Jialiang Zhu, Qi Dai, Kai Qiu, Yifan Yang, Chong Luo, Tianyi Chen, Justin Wagle, Tim Franklin, Baining Guo:
Phi-Ground Tech Report: Advancing Perception in GUI Grounding. CoRR abs/2507.23779 (2025)
[i18]Hui Wang, Jinghua Zhao, Yifan Yang, Shujie Liu, Junyang Chen, Yanzhe Zhang, Shiwan Zhao, Jinyu Li, Jiaming Zhou, Haoqin Sun, Yan Lu, Yong Qin:
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation. CoRR abs/2510.14664 (2025)- 2024
[i17]Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen, Kai Yu:
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech. CoRR abs/2401.14321 (2024)
[i16]Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen:
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity. CoRR abs/2402.08846 (2024)
[i15]Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. CoRR abs/2404.06079 (2024)
[i14]Zheshu Song, Jianheng Zhuo, Yifan Yang, Ziyang Ma, Shixiong Zhang, Xie Chen:
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR. CoRR abs/2406.06619 (2024)
[i13]Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen:
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement. CoRR abs/2406.11546 (2024)
[i12]Zengrui Jin, Yifan Yang, Mohan Shi, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey:
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization. CoRR abs/2409.00819 (2024)
[i11]Mingyu Cui, Yifan Yang, Jiajun Deng, Jiawen Kang, Shujie Hu, Tianzi Wang, Zhaoqing Li, Shiliang Zhang, Xie Chen, Xunying Liu:
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR. CoRR abs/2409.08797 (2024)
[i10]Mingyu Cui, Daxin Tan, Yifan Yang, Dingdong Wang, Huimeng Wang, Xiao Chen, Xie Chen, Xunying Liu:
Exploring SSL Discrete Tokens for Multilingual ASR. CoRR abs/2409.08805 (2024)
[i9]Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin
:
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought. CoRR abs/2409.19510 (2024)
[i8]Rui Tian, Qi Dai, Jianmin Bao, Kai Qiu, Yifan Yang, Chong Luo, Zuxuan Wu, Yu-Gang Jiang:
REDUCIO! Generating 1024⨉1024 Video within 16 Seconds using Extremely Compressed Motion Latents. CoRR abs/2411.13552 (2024)
[i7]Miaosen Zhang, Qi Dai, Yifan Yang, Jianmin Bao, Dongdong Chen, Kai Qiu, Chong Luo, Xin Geng, Baining Guo:
MageBench: Bridging Large Multimodal Models to Agents. CoRR abs/2412.04531 (2024)- 2023
[i6]Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey:
Blank-regularized CTC for Frame Skipping in Neural Transducer. CoRR abs/2305.11558 (2023)
[i5]Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. CoRR abs/2309.07377 (2023)
[i4]Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey:
PromptASR for contextualized ASR with controllable style. CoRR abs/2309.07414 (2023)
[i3]Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen:
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer. CoRR abs/2309.07648 (2023)
[i2]Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey:
Libriheavy: a 50, 000 hours ASR corpus with punctuation casing and context. CoRR abs/2309.08105 (2023)
[i1]Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey:
Zipformer: A faster and better encoder for automatic speech recognition. CoRR abs/2310.11230 (2023)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-11-16 00:39 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







