default search action
Zhiyong Wu 0001
吴志勇
Person information
- unicode name: 吴志勇
- affiliation (PhD): Tsinghua University, Joint Research Center for Media Sciences, Beijing, China
- affiliation: Chinese University of Hong Kong, Hong Kong
Other persons with the same name
- Zhiyong Wu — disambiguation page
- Zhiyong Wu 0002 — Hohai University, College of Hydrology and Water Resources, Nanjing, China
- Zhiyong Wu 0003 — Shanghai AI Laboratory (and 1 more)
- Zhiyong Wu 0004 — University of Science and Technology of China, School of Computer Science and Technology, Hefei, China
- Zhiyong Wu 0005 — Anhui Polytechnic University, School of Mathematics and Physics, Wuhu, China
- Zhiyong Wu 0006 — Nanjing University of Posts and Telecommunications, College of Automation, China
- Zhiyong Wu 0007 — Army Engineering University, Institute of Command and Control Engineering, Nanjing, China
- Zhiyong Wu 0008 — Chinese Academy of Sciences, Changchun Institute of Optics, Fine Mechanics and Physics, China
- Zhiyong Wu 0009 — Shantou Central Hospital, Departments of Oncology Surgery, Shantou, China
- Zhiyong Wu 0010 — Tsinghua University, KLISS, BNRist, School of Software, Beijing, China
- Zhiyong Wu 0011 — Shanghai Artificial Intelligence Laboratory, China
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j13]Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing. IEEE ACM Trans. Audio Speech Lang. Process. 32: 517-528 (2024) - [c195]Zilin Wang, Haolin Zhuang, Lu Li, Yinmin Zhang, Junjie Zhong, Jun Chen, Yu Yang, Boshi Tang, Zhiyong Wu:
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations. AAAI 2024: 301-309 - [c194]Boshi Tang, Zhiyong Wu, Xixin Wu, Qiaochu Huang, Jun Chen, Shun Lei, Helen Meng:
SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes. AAAI 2024: 15267-15275 - [c193]Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shi-Xiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu:
SECap: Speech Emotion Captioning with Large Language Model. AAAI 2024: 19323-19331 - [c192]Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu:
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model. CVPR 2024: 2263-2273 - [c191]Yaxin Liu, Xiaomei Nie, Zhiyong Wu:
Collaboration of Digital Human Gestures and Teaching Materials for Enhanced Integration in MOOC Teaching Scenarios. HCI (59) 2024: 169-175 - [c190]Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu:
The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge. ICASSP Workshops 2024: 71-72 - [c189]Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng:
Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation. ICASSP 2024: 941-945 - [c188]Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng:
Consistent and Relevant: Rethink the Query Embedding in General Sound Separation. ICASSP 2024: 961-965 - [c187]Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng:
SCNet: Sparse Compression Network for Music Source Separation. ICASSP 2024: 1276-1280 - [c186]Xingda Li, Fan Zhuo, Dan Luo, Jun Chen, Shiyin Kang, Zhiyong Wu, Tao Jiang, Yang Li, Han Fang, Yahui Zhou:
Generating Stereophonic Music with Single-Stage Language Models. ICASSP 2024: 1471-1475 - [c185]Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu:
FreeTalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness. ICASSP 2024: 7945-7949 - [c184]Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng:
Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information. ICASSP 2024: 8185-8189 - [c183]Haiwei Xue, Sicheng Yang, Zhensong Zhang, Zhiyong Wu, Minglei Li, Zonghong Dai, Helen Meng:
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models. ICASSP 2024: 8296-8300 - [c182]Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu, Helen Meng:
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations. ICASSP 2024: 11141-11145 - [c181]Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng:
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis. ICASSP 2024: 12316-12320 - [c180]Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng:
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction. ICASSP 2024: 12341-12345 - [c179]Binzhu Sha, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng:
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion. ICASSP 2024: 12577-12581 - [c178]Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng:
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts. ICASSP 2024: 12662-12666 - [c177]Yaoxun Xu, Xingchen Song, Zhiyong Wu, Di Wu, Zhendong Peng, Binbin Zhang:
Hydraformer: One Encoder for All Subsampling Rates. ICME 2024: 1-6 - [c176]Ming Cheng, Shun Lei, Dongyang Dai, Zhiyong Wu, Dading Chong:
NRAdapt: Noise-Robust Adaptive Text to Speech Using Untranscribed Data. IJCNN 2024: 1-8 - [c175]Rui Niu, Zhiyong Wu, Changhe Song:
Representation Space Maintenance: Against Forgetting in Continual Learning. IJCNN 2024: 1-7 - [i99]Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu:
Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness. CoRR abs/2401.03476 (2024) - [i98]Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng:
Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation. CoRR abs/2401.07532 (2024) - [i97]Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng:
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction. CoRR abs/2401.17796 (2024) - [i96]Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng:
Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information. CoRR abs/2403.05834 (2024) - [i95]Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu:
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model. CoRR abs/2404.01862 (2024) - [i94]Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu:
The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge. CoRR abs/2404.16619 (2024) - [i93]Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng:
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction. CoRR abs/2406.08336 (2024) - [i92]Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng:
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models. CoRR abs/2407.13509 (2024) - [i91]Yaoxun Xu, Xingchen Song, Zhiyong Wu, Di Wu, Zhendong Peng, Binbin Zhang:
HydraFormer: One Encoder For All Subsampling Rates. CoRR abs/2408.04325 (2024) - [i90]Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu:
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description. CoRR abs/2408.13608 (2024) - [i89]Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang:
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement. CoRR abs/2408.14211 (2024) - [i88]Yinghao Ma, Anders Øland, Anton Ragni, Bleiz Macsen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wenhao Huang, Xingjian Du, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang:
Foundation Models for Music: A Survey. CoRR abs/2408.14340 (2024) - [i87]Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia:
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling. CoRR abs/2408.15676 (2024) - [i86]Yaoxun Xu, Shi-Xiong Zhang, Jianwei Yu, Zhiyong Wu, Dong Yu:
Comparing Discrete and Continuous Space LLMs for Speech Recognition. CoRR abs/2409.00800 (2024) - [i85]Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng:
SongCreator: Lyrics-based Universal Song Generation. CoRR abs/2409.06029 (2024) - [i84]Wei Chen, Xintao Zhao, Jun Chen, Binzhu Sha, Zhiwei Lin, Zhiyong Wu:
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion. CoRR abs/2409.06237 (2024) - [i83]Shuochen Gao, Shun Lei, Fan Zhuo, Hangyu Liu, Feng Liu, Boshi Tang, Qiaochu Huang, Shiyin Kang, Zhiyong Wu:
An End-to-End Approach for Chord-Conditioned Song Generation. CoRR abs/2409.06307 (2024) - [i82]Zhiqi Huang, Dan Luo, Jun Wang, Huan Liao, Zhiheng Li, Zhiyong Wu:
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis. CoRR abs/2409.08628 (2024) - 2023
- [j12]Xingwei Liang, Lu Zhang, Zhiyong Wu, Ruifeng Xu:
Lite-RTSE: Exploring a Cost-Effective Lite DNN Model for Real-Time Speech Enhancement in RTC Scenarios. IEEE Signal Process. Lett. 30: 1697-1701 (2023) - [j11]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng:
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3290-3303 (2023) - [j10]Xixin Wu, Hui Lu, Kun Li, Zhiyong Wu, Xunying Liu, Helen Meng:
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3993-4003 (2023) - [c174]Zhihan Yang, Zhiyong Wu, Ying Shan, Jia Jia:
What Does Your Face Sound Like? 3D Face Shape towards Voice. AAAI 2023: 13905-13913 - [c173]Yunrui Cai, Changhe Song, Boshi Tang, Dongyang Dai, Zhiyong Wu, Helen Meng:
Robust Representation Learning for Speech Emotion Recognition with Moment Exchange. APSIPA ASC 2023: 1002-1007 - [c172]Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang:
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation. CVPR 2023: 2321-2330 - [c171]Weihong Bao, Liyang Chen, Chaoyong Zhou, Sicheng Yang, Zhiyong Wu:
Wavsyncswap: End-To-End Portrait-Customized Audio-Driven Talking Face Generation. ICASSP 2023: 1-5 - [c170]Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu, Yannan Wang, Shidong Shang, Helen Meng:
Inter-Subnet: Speech Enhancement with Subband Interaction. ICASSP 2023: 1-5 - [c169]Jun Chen, Yupeng Shi, Wenzhe Liu, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu, Shidong Shang, Chengshi Zheng:
Gesper: A Unified Framework for General Speech Restoration. ICASSP 2023: 1-2 - [c168]Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu:
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech. ICASSP 2023: 1-5 - [c167]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng:
Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis. ICASSP 2023: 1-5 - [c166]Jiuxin Lin, Xinyu Cai, Heinrich Dinkel, Jun Chen, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Zhiyong Wu, Yujun Wang, Helen Meng:
Av-Sepformer: Cross-Attention Sepformer for Audio-Visual Target Speaker Extraction. ICASSP 2023: 1-5 - [c165]Xingchen Song, Di Wu, Zhiyong Wu, Binbin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu:
TrimTail: Low-Latency Streaming ASR with Simple But Effective Spectrogram-Level Length Penalty. ICASSP 2023: 1-5 - [c164]Weinan Tong, Jiaxu Zhu, Jun Chen, Zhiyong Wu, Shiyin Kang, Helen Meng:
TFCnet: Time-Frequency Domain Corrector for Speech Separation. ICASSP 2023: 1-5 - [c163]Zilin Wang, Peng Liu, Jun Chen, Sipan Li, Jinfeng Bai, Gang He, Zhiyong Wu, Helen Meng:
A Synthetic Corpus Generation Method for Neural Vocoder Training. ICASSP 2023: 1-5 - [c162]Yuanyuan Wang, Yang Zhang, Zhiyong Wu, Zhihan Yang, Tao Wei, Kun Zou, Helen Meng:
DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification. ICASSP 2023: 1-5 - [c161]Yaoxun Xu, Baiji Liu, Qiaochu Huang, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng:
CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition. ICASSP 2023: 1-5 - [c160]Yujie Yang, Kun Zhang, Zhiyong Wu, Helen Meng:
Keyword-Specific Acoustic Model Pruning for Open-Vocabulary Keyword Spotting. ICASSP 2023: 1-5 - [c159]Shaohuan Zhou, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng:
Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-Training. ICASSP 2023: 1-5 - [c158]Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng:
GTN-Bailando: Genre Consistent long-Term 3D Dance Generation Based on Pre-Trained Genre Token Network. ICASSP 2023: 1-5 - [c157]Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao:
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer. ICCV (Workshops) 2023: 2969-2979 - [c156]Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng:
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion. ICME 2023: 1691-1696 - [c155]Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng:
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias. ICME 2023: 1703-1708 - [c154]Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai:
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023. ICMI 2023: 779-785 - [c153]Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, Long Xiao:
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models. IJCAI 2023: 5860-5868 - [c152]Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng:
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation. INTERSPEECH 2023: 1334-1338 - [c151]Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu:
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs. INTERSPEECH 2023: 1648-1652 - [c150]Jiuxin Lin, Peng Wang, Heinrich Dinkel, Jun Chen, Zhiyong Wu, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang:
Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information. INTERSPEECH 2023: 2488-2492 - [c149]Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng:
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge. INTERSPEECH 2023: 3272-3276 - [c148]Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis. INTERSPEECH 2023: 3377-3381 - [c147]Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu:
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation. INTERSPEECH 2023: 4034-4038 - [c146]Wenzhe Liu, Yupeng Shi, Jun Chen, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu:
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction. INTERSPEECH 2023: 4044-4048 - [c145]Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng:
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model. INTERSPEECH 2023: 4858-4862 - [c144]Zhihan Yang, Shansong Liu, Xu Li, Haozhe Wu, Zhiyong Wu, Ying Shan, Jia Jia:
Prosody Modeling with 3D Visual Information for Expressive Video Dubbing. INTERSPEECH 2023: 4863-4867 - [c143]Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai:
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons. ACM Multimedia 2023: 1033-1044 - [c142]Hui Lu, Xixin Wu, Zhiyong Wu, Helen Meng:
SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody. ACM Multimedia 2023: 2829-2837 - [c141]Yunrui Cai, Jingran Xie, Boshi Tang, Yuanyuan Wang, Jun Chen, Haiwei Xue, Zhiyong Wu:
First-order Multi-label Learning with Cross-modal Interactions for Multimodal Emotion Recognition. MRAC@MM 2023: 13-20 - [i81]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng:
Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis. CoRR abs/2304.06359 (2023) - [i80]Yaoxun Xu, Baiji Liu, Qiaochu Huang, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng:
CB-Conformer: Contextual biasing Conformer for biased word recognition. CoRR abs/2304.09607 (2023) - [i79]Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng:
GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network. CoRR abs/2304.12704 (2023) - [i78]Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, Long Xiao:
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models. CoRR abs/2305.04919 (2023) - [i77]Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing. CoRR abs/2305.05203 (2023) - [i76]Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu, Yannan Wang, Shidong Shang, Helen Meng:
Inter-SubNet: Speech Enhancement with Subband Interaction. CoRR abs/2305.05599 (2023) - [i75]Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng:
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion. CoRR abs/2305.09167 (2023) - [i74]Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu:
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs. CoRR abs/2305.10649 (2023) - [i73]Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang:
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation. CoRR abs/2305.11094 (2023) - [i72]Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng:
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model. CoRR abs/2305.16749 (2023) - [i71]Wenzhe Liu, Yupeng Shi, Jun Chen, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu:
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction. CoRR abs/2306.08454 (2023) - [i70]Jiuxin Lin, Xinyu Cai, Heinrich Dinkel, Jun Chen, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Zhiyong Wu, Yujun Wang, Helen Meng:
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction. CoRR abs/2306.14170 (2023) - [i69]Jiuxin Lin, Peng Wang, Heinrich Dinkel, Jun Chen, Zhiyong Wu, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang:
Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information. CoRR abs/2306.16241 (2023) - [i68]Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu:
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation. CoRR abs/2306.16250 (2023) - [i67]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng:
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis. CoRR abs/2307.16012 (2023) - [i66]Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao:
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer. CoRR abs/2308.04830 (2023) - [i65]Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai:
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023. CoRR abs/2308.13879 (2023) - [i64]Yi Meng, Xiang Li, Zhiyong Wu, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng:
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis. CoRR abs/2308.16021 (2023) - [i63]Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu:
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech. CoRR abs/2308.16569 (2023) - [i62]Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng:
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information. CoRR abs/2308.16577 (2023) - [i61]Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis. CoRR abs/2308.16593 (2023) - [i60]Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information. CoRR abs/2308.16836 (2023) - [i59]Shaohuan Zhou, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng:
Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training. CoRR abs/2309.00284 (2023) - [i58]Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen M. Meng:
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge. CoRR abs/2309.01437 (2023) - [i57]Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen M. Meng:
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation. CoRR abs/2309.02459 (2023) - [i56]Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai:
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons. CoRR abs/2309.07051 (2023) - [i55]Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu,