default search action
Kai Yu 0004
Person information
- affiliation: Shanghai Jiao Tong University, Computer Science and Engineering Department, China
- affiliation (PhD 2006): Cambridge University, Engineering Department, UK
Other persons with the same name
- Kai Yu — disambiguation page
- Kai Yu 0001 — Baidu Inc., Institute of Deep Learning, Beijing, China (and 3 more)
- Kai Yu 0002 — Royal Institute of Technology, Stockholm, Sweden
- Kai Yu 0003 — University of Minnesota, Department of Biomedical Engineering, Minneapolis, MN, USA
- Kai Yu 0005 — Zhejiang University, State Key Laboratory of Industrial Control Technology, Hangzhou, China
- Kai Yu 0006 — Beijing Normal University, School of Geography, China (and 1 more)
- Kai Yu 0007 — Hohai University, College of Oceanography, Nanjing, China (and 2 more)
- Kai Yu 0008 — Guangdong University of Technology, School of Information Engineering, School of Integrated Circuits, China (and 1 more)
- Kai Yu 0009 — Soochow University, School of Electronics and Information Engineering, Jiangsu, China (and 1 more)
- Kai Yu 0010 — Nanjing University, School of Electronic Science and Engineering, China
- Kai Yu 0011 — Sun Yat-Sen University, Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, China
- Kai Yu 0012 — Chinese Academy of Sciences, Shanghai Institute of Microsystem and Information Technology, China
- Kai Yu 0013 — Beihang University, School of Computer Science and Engineering, State Key Laboratory of Software Development Environment, China
- Kai Yu 0014 — Nanjing University of Aeronautics and Astronautics, College of Electronic and Information Engineering, China
- Kai Yu 0015 — Shandong University of Science and Technology, College of Mining and Safety Engineering, Qingdao, China
- Kai Yu 0016 — Intel Corporation, Hillsboro, OR, USA (and 1 more)
- Kai Yu 0017 — Nankai University, Chern Institute of Mathematics and LPMC, Tianjin, China
- Kai Yu 0018 — Hangzhou Normal University, Department of Information Science and Technology, China (and 2 more)
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j42]Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu:
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning. IEEE ACM Trans. Audio Speech Lang. Process. 32: 95-112 (2024) - [j41]Wenbin Jiang, Kai Yu, Fei Wen:
Unsupervised Speech Enhancement Using Optimal Transport and Speech Presence Probability. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4445-4455 (2024) - [j40]Zheng Liang, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4810-4821 (2024) - [j39]Xuenan Xu, Ziyang Ma, Mengyue Wu, Kai Yu:
Towards Weakly Supervised Text-to-Audio Grounding. IEEE Trans. Multim. 26: 11126-11138 (2024) - [c224]Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu:
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding. AAAI 2024: 17924-17932 - [c223]Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, Kai Yu:
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research. AAAI 2024: 19053-19061 - [c222]Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu:
Sparsity-Accelerated Training for Large Language Models. ACL (Findings) 2024: 14696-14707 - [c221]Ruiyang Zhou, Lu Chen, Kai Yu:
Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks. LREC/COLING 2024: 9340-9351 - [c220]Hongchuan Zeng, Hongshen Xu, Lu Chen, Kai Yu:
Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind. LREC/COLING 2024: 11794-11812 - [c219]Yang Han, Yiming Wang, Rui Wang, Lu Chen, Kai Yu:
AlignSum: Data Pyramid Hierarchical Fine-tuning for Aligning with Human Summarization Preference. EMNLP (Findings) 2024: 8506-8522 - [c218]Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu:
Enhancing Audio Generation Diversity with Visual Information. ICASSP 2024: 866-870 - [c217]Xuenan Xu, Xiaohang Xu, Zeyu Xie, Pingyue Zhang, Mengyue Wu, Kai Yu:
A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds. ICASSP 2024: 1091-1095 - [c216]Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu:
DiffDub: Person-Generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-Encoder. ICASSP 2024: 3630-3634 - [c215]Pingyue Zhang, Mengyue Wu, Kai Yu:
Semantic-Enhanced Supervised Contrastive Learning. ICASSP 2024: 6030-6034 - [c214]Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. ICASSP 2024: 10401-10405 - [c213]Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu:
VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching. ICASSP 2024: 11121-11125 - [c212]Sen Liu, Yiwei Guo, Xie Chen, Kai Yu:
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations. ICASSP 2024: 11521-11525 - [c211]Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
Acoustic BPE for Speech Generation with Discrete Tokens. ICASSP 2024: 11746-11750 - [c210]Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu Chen, Kai Yu:
A Birgat Model for Multi-Intent Spoken Language Understanding with Hierarchical Semantic Frames. ICASSP 2024: 12251-12255 - [c209]Junjie Li, Yiwei Guo, Xie Chen, Kai Yu:
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention. ICASSP 2024: 12296-12300 - [c208]Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu:
Evolving Subnetwork Training for Large Language Models. ICML 2024 - [c207]Tao Liu, Feilong Chen, Shuai Fan, Chenpeng Du, Qi Chen, Xie Chen, Kai Yu:
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding. ACM Multimedia 2024: 6696-6705 - [c206]Hanchong Zhang, Ruisheng Cao, Hongshen Xu, Lu Chen, Kai Yu:
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions. NAACL-HLT 2024: 6487-6508 - [c205]Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu:
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding. WSDM 2024: 864-872 - [i119]Xuenan Xu, Ziyang Ma, Mengyue Wu, Kai Yu:
Towards Weakly Supervised Text-to-Audio Grounding. CoRR abs/2401.02584 (2024) - [i118]Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen, Kai Yu:
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech. CoRR abs/2401.14321 (2024) - [i117]Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, Xin Chen, Kai Yu:
ChemDFM: Dialogue Foundation Model for Chemistry. CoRR abs/2401.14818 (2024) - [i116]Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, Situo Zhang, Zihan Zhao, Liangtai Sun, Kai Yu:
MULTI: Multimodal Understanding Leaderboard with Text and Images. CoRR abs/2402.03173 (2024) - [i115]Yiming Ai, Zhiwei He, Ziyin Zhang, Wenhong Zhu, Hongkun Hao, Kai Yu, Lingjun Chen, Rui Wang:
Is Cognition and Action Consistent or Not: Investigating Large Language Model's Personality. CoRR abs/2402.14679 (2024) - [i114]Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu Chen, Kai Yu:
A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames. CoRR abs/2402.18258 (2024) - [i113]Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu:
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding. CoRR abs/2402.18262 (2024) - [i112]Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu:
Enhancing Audio Generation Diversity with Visual Information. CoRR abs/2403.01278 (2024) - [i111]Yutong Li, Lu Chen, Aiwei Liu, Kai Yu, Lijie Wen:
ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary. CoRR abs/2403.02574 (2024) - [i110]Xuenan Xu, Xiaohang Xu, Zeyu Xie, Pingyue Zhang, Mengyue Wu, Kai Yu:
A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds. CoRR abs/2403.04594 (2024) - [i109]Hongshen Xu, Zichen Zhu, Da Ma, Situo Zhang, Shuai Fan, Lu Chen, Kai Yu:
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback. CoRR abs/2403.18349 (2024) - [i108]Hongchuan Zeng, Hongshen Xu, Lu Chen, Kai Yu:
Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind. CoRR abs/2404.04748 (2024) - [i107]Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. CoRR abs/2404.06079 (2024) - [i106]Sen Liu, Yiwei Guo, Xie Chen, Kai Yu:
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations. CoRR abs/2404.14946 (2024) - [i105]Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu:
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech. CoRR abs/2404.19723 (2024) - [i104]Hanchong Zhang, Ruisheng Cao, Hongshen Xu, Lu Chen, Kai Yu:
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions. CoRR abs/2405.02712 (2024) - [i103]Tao Liu, Feilong Chen, Shuai Fan, Chenpeng Du, Qi Chen, Xie Chen, Kai Yu:
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding. CoRR abs/2405.03121 (2024) - [i102]Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu:
Sparsity-Accelerated Training for Large Language Models. CoRR abs/2406.01392 (2024) - [i101]Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu:
Evolving Subnetwork Training for Large Language Models. CoRR abs/2406.06962 (2024) - [i100]Zeyu Xie, Baihan Li, Xuenan Xu, Zheng Liang, Kai Yu, Mengyue Wu:
FakeSound: Deepfake General Audio Detection. CoRR abs/2406.08052 (2024) - [i99]Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen:
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement. CoRR abs/2406.11546 (2024) - [i98]Bohan Li, Feiyu Shen, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu:
On the Effectiveness of Acoustic BPE in Decoder-Only TTS. CoRR abs/2407.03892 (2024) - [i97]Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu:
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? CoRR abs/2407.10956 (2024) - [i96]Baihan Li, Zeyu Xie, Xuenan Xu, Yiwei Guo, Ming Yan, Ji Zhang, Kai Yu, Mengyue Wu:
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation. CoRR abs/2407.13198 (2024) - [i95]Yiwei Guo, Zhihan Li, Junjie Li, Chenpeng Du, Hankun Wang, Shuai Wang, Xie Chen, Kai Yu:
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders. CoRR abs/2409.01995 (2024) - [i94]Zihan Zhao, Bo Chen, Jingpiao Li, Lu Chen, Liyang Wen, Pengyu Wang, Zichen Zhu, Danyang Zhang, Ziping Wan, Yansi Li, Zhongyang Dai, Xin Chen, Kai Yu:
ChemDFM-X: Towards Large Multimodal Model for Chemistry. CoRR abs/2409.13194 (2024) - [i93]Liangtai Sun, Danyu Luo, Da Ma, Zihan Zhao, Baocai Chen, Zhennan Shen, Su Zhu, Lu Chen, Xin Chen, Kai Yu:
SciDFM: A Large Language Model with Mixture-of-Experts for Science. CoRR abs/2409.18412 (2024) - [i92]Yang Han, Yiming Wang, Rui Wang, Lu Chen, Kai Yu:
AlignSum: Data Pyramid Hierarchical Fine-tuning for Aligning with Human Summarization Preference. CoRR abs/2410.00409 (2024) - [i91]Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen:
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching. CoRR abs/2410.06885 (2024) - [i90]Wenxi Chen, Ziyang Ma, Xiquan Li, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Kai Yu, Xie Chen:
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs. CoRR abs/2410.09503 (2024) - [i89]Zichen Zhu, Hao Tang, Yansi Li, Kunyao Lan, Yixuan Jiang, Hao Zhou, Yixiao Wang, Situo Zhang, Liangtai Sun, Lu Chen, Kai Yu:
MobA: A Two-Level Agent System for Efficient Mobile Task Automation. CoRR abs/2410.13757 (2024) - [i88]Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu:
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec. CoRR abs/2410.15764 (2024) - 2023
- [j38]Ruisheng Cao, Lu Chen, Jieyu Li, Hanchong Zhang, Hongshen Xu, Wangyou Zhang, Kai Yu:
A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL. IEEE Trans. Pattern Anal. Mach. Intell. 45(11): 13796-13813 (2023) - [j37]Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu, Kai Yu:
OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue. Trans. Assoc. Comput. Linguistics 11: 68-84 (2023) - [j36]Wenbin Jiang, Kai Yu:
Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1758-1770 (2023) - [j35]Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu:
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3446-3456 (2023) - [c204]Sheng Jiang, Su Zhu, Ruisheng Cao, Qingliang Miao, Kai Yu:
SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot Filling. ACL (industry) 2023: 668-675 - [c203]Jieyu Li, Lu Chen, Ruisheng Cao, Su Zhu, Hongshen Xu, Zhi Chen, Hanchong Zhang, Kai Yu:
Exploring Schema Generalizability of Text-to-SQL. ACL (Findings) 2023: 1344-1360 - [c202]Yiming Ai, Zhiwei He, Kai Yu, Rui Wang:
TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation. ACL (2) 2023: 1930-1941 - [c201]Hanchong Zhang, Jieyu Li, Lu Chen, Ruisheng Cao, Yunyan Zhang, Yu Huang, Yefeng Zheng, Kai Yu:
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset. ACL (Findings) 2023: 6970-6983 - [c200]Hanchong Zhang, Ruisheng Cao, Lu Chen, Hongshen Xu, Kai Yu:
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought. EMNLP (Findings) 2023: 3501-3532 - [c199]Qi Chen, Ziyang Ma, Tao Liu, Xu Tan, Qu Lu, Kai Yu, Xie Chen:
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation. ICASSP 2023: 1-5 - [c198]Chenpeng Du, Yiwei Guo, Feiyu Shen, Kai Yu:
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge. ICASSP 2023: 1-2 - [c197]Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance. ICASSP 2023: 1-5 - [c196]Guangwei Li, Xuenan Xu, Lingfeng Dai, Mengyue Wu, Kai Yu:
Diverse and Vivid Sound Generation from Text Descriptions. ICASSP 2023: 1-5 - [c195]Tao Liu, Zhengyang Chen, Yanmin Qian, Kai Yu:
Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge. ICASSP 2023: 1-2 - [c194]Zhijun Liu, Yiwei Guo, Kai Yu:
DiffVoice: Text-to-Speech with Latent Diffusion. ICASSP 2023: 1-5 - [c193]Xuenan Xu, Mengyue Wu, Kai Yu:
Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning. ICASSP Workshops 2023: 1-5 - [c192]Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech. INTERSPEECH 2023: 616-620 - [c191]Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation. INTERSPEECH 2023: 919-923 - [c190]Pingyue Zhang, Mengyue Wu, Kai Yu:
ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection. INTERSPEECH 2023: 2998-3002 - [c189]Wenbin Jiang, Fei Wen, Yifan Zhang, Kai Yu:
UnSE: Unsupervised Speech Enhancement Using Optimal Transport. INTERSPEECH 2023: 4029-4033 - [c188]Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu:
Enhance Temporal Relations in Audio Captioning with Sound Event Detection. INTERSPEECH 2023: 4179-4183 - [c187]Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian:
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder. ACM Multimedia 2023: 4281-4289 - [c186]Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu:
Large Language Models Are Semi-Parametric Reinforcement Learning Agents. NeurIPS 2023 - [i87]Qi Chen, Ziyang Ma, Tao Liu, Xu Tan, Qu Lu, Xie Chen, Kai Yu:
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation. CoRR abs/2303.05322 (2023) - [i86]Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian:
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder. CoRR abs/2303.17550 (2023) - [i85]Zhijun Liu, Yiwei Guo, Kai Yu:
DiffVoice: Text-to-Speech with Latent Diffusion. CoRR abs/2304.11750 (2023) - [i84]Chenpeng Du, Yiwei Guo, Feiyu Shen, Kai Yu:
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge. CoRR abs/2304.13121 (2023) - [i83]Guangwei Li, Xuenan Xu, Lingfeng Dai, Mengyue Wu, Kai Yu:
Diverse and Vivid Sound Generation from Text Descriptions. CoRR abs/2305.01980 (2023) - [i82]Danyang Zhang, Lu Chen, Kai Yu:
Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction. CoRR abs/2305.08144 (2023) - [i81]Yiming Ai, Zhiwei He, Kai Yu, Rui Wang:
TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation. CoRR abs/2305.13740 (2023) - [i80]Hanchong Zhang, Jieyu Li, Lu Chen, Ruisheng Cao, Yunyan Zhang, Yu Huang, Yefeng Zheng, Kai Yu:
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset. CoRR abs/2305.15891 (2023) - [i79]Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu:
Enhance Temporal Relations in Audio Captioning with Sound Event Detection. CoRR abs/2306.01533 (2023) - [i78]Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu:
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding. CoRR abs/2306.07547 (2023) - [i77]Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu:
Large Language Model Is Semi-Parametric Reinforcement Learning Agent. CoRR abs/2306.07929 (2023) - [i76]Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation. CoRR abs/2306.08588 (2023) - [i75]Hanxue Zhang, Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu:
Improving Audio Caption Fluency with Automatic Error Correction. CoRR abs/2306.10090 (2023) - [i74]Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech. CoRR abs/2306.14145 (2023) - [i73]Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, Kai Yu:
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research. CoRR abs/2308.13149 (2023) - [i72]Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu:
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching. CoRR abs/2309.05027 (2023) - [i71]Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. CoRR abs/2309.07377 (2023) - [i70]Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
Acoustic BPE for Speech Generation with Discrete Tokens. CoRR abs/2310.14580 (2023) - [i69]Hanchong Zhang, Ruisheng Cao, Lu Chen, Hongshen Xu, Kai Yu:
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought. CoRR abs/2310.17342 (2023) - [i68]Ruisheng Cao, Hanchong Zhang, Hongshen Xu, Jieyu Li, Da Ma, Lu Chen, Kai Yu:
ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL. CoRR abs/2310.18662 (2023) - [i67]Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu:
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations. CoRR abs/2311.01260 (2023) - [i66]Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu:
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder. CoRR abs/2311.01811 (2023) - [i65]Junjie Li, Yiwei Guo, Xie Chen, Kai Yu:
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention. CoRR abs/2312.08676 (2023) - 2022
- [j34]Bo Chen, Zhihang Xu, Kai Yu:
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler. Speech Commun. 136: 14-22 (2022) - [j33]Chenpeng Du, Kai Yu:
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 30: 190-201 (2022) - [j32]Bo Chen, Chenpeng Du, Kai Yu:
Neural Fusion for Voice Cloning. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1993-2001 (2022) - [c185]Binwei Yao, Chao Shi, Likai Zou, Lingfeng Dai, Mengyue Wu, Lu Chen, Zhen Wang, Kai Yu:
D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat. EMNLP 2022: 2438-2459 - [c184]Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, Kai Yu:
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. EMNLP 2022: 6699-6712 - [c183]Zhi Chen, Bei Chen, Lu Chen, Kai Yu, Jian-Guang Lou:
AdapterShare: Task Correlation Modeling with Adapter Differentiation. EMNLP 2022: 10645-10651 - [c182]Wenbin Jiang, Zhijun Liu, Kai Yu, Fei Wen:
Speech Enhancement with Neural Homomorphic Synthesis. ICASSP 2022: 376-380 - [c181]