


default search action
Haizhou Li 0001
李海洲
Person information
- unicode name: 李海洲
- affiliation: Chinese University of Hong Kong (Shenzhen), China
- affiliation: National University of Singapore, Department of Electrical and Computer Engineering, Singapore
- affiliation (2006 - 2016): Nanyang Technological University, Singapore
- affiliation (2003 - 2016): Institute for Infocomm Research, A*STAR, Singapore
- affiliation (2011): University of New South Wales, Sydney, Australia
- affiliation (2009): University of Eastern Finland, Kuopio, Finland
- affiliation (PhD 1990): South China University of Technology, Guangzhou, China
Other persons with the same name
- Haizhou Li
- Haizhou Li 0002 — Blaise Pascal University, Clermont-Ferrand, France
- Haizhou Li 0003 — City University of Hong Kong, Department of Computer Science, Hong Kong
- Haizhou Li 0004 — Beijing Institute of Technology, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2025
- [j194]Rui Liu
, Zhenqi Jia, Feilong Bao, Haizhou Li:
Retrieval-Augmented Dialogue Knowledge Aggregation for expressive conversational speech synthesis. Inf. Fusion 118: 102948 (2025) - [j193]Rui Liu
, Hongyu Yuan, Guanglai Gao, Haizhou Li:
Listening and seeing again: Generative error correction for audio-visual speech recognition. Inf. Fusion 120: 103077 (2025) - [j192]Rui Liu
, Jinhua Zhang, Haizhou Li:
Hierarchical multi-source cues fusion for mono-to-binaural based Audio Deepfake Detection. Inf. Fusion 120: 103097 (2025) - [j191]Xinyuan Qian
, Xianghu Yue
, Jiadong Wang
, Huiping Zhuang
, Haizhou Li
:
Analytic Class Incremental Learning for Sound Source Localization With Privacy Protection. IEEE Signal Process. Lett. 32: 726-730 (2025) - [j190]Yi Ma
, Shuai Wang
, Tianchi Liu
, Haizhou Li
:
ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification. IEEE Signal Process. Lett. 32: 731-735 (2025) - [j189]Jiqing Zhang
, Malu Zhang
, Yuanchen Wang
, Qianhui Liu, Baocai Yin
, Haizhou Li
, Xin Yang:
Spiking Neural Networks With Adaptive Membrane Time Constant for Event-Based Tracking. IEEE Trans. Image Process. 34: 1009-1021 (2025) - [j188]Ruihang Ji
, Dongyu Li
, Shuzhi Sam Ge
, Haizhou Li
:
Tunnel Prescribed Control of Nonlinear Systems With Unknown Control Directions. IEEE Trans. Neural Networks Learn. Syst. 36(1): 1383-1395 (2025) - [c739]Rui Liu, Shuwei He, Yifan Hu, Haizhou Li:
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech. AAAI 2025: 24632-24640 - [c738]Chen Zhang, Dading Chong, Feng Jiang, Chengguang Tang, Anningzhe Gao, Guohua Tang, Haizhou Li:
Aligning Language Models Using Follow-up Likelihood as Reward Signal. AAAI 2025: 25832-25841 - [e25]Haizhou Li
, Tanja Schultz
, Yalei Bi, Jian Zhu, Hongsheng He
, Jun Ma, Siqi Cai
, Wanyue Jiang
, Shuzhi Sam Ge
:
Social Robotics - 16th International Conference, ICSR + InnoBiz 2024, Shenzhen, China, September 25-28, 2024, Proceedings. Lecture Notes in Computer Science 15170, Springer 2025, ISBN 978-981-96-1150-8 [contents] - [i241]Rui Liu, Hongyu Yuan, Haizhou Li:
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition. CoRR abs/2501.04038 (2025) - [i240]Yi Ma, Shuai Wang, Tianchi Liu, Haizhou Li:
ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification. CoRR abs/2501.05729 (2025) - [i239]Honglin Cao, Zijian Zhou, Wenjie Wei, Ammar Belatreche, Yu Liang, Dehao Zhang, Malu Zhang, Yang Yang, Haizhou Li:
Binary Event-Driven Spiking Transformer. CoRR abs/2501.05904 (2025) - [i238]Rui Liu, Zhenqi Jia, Feilong Bao, Haizhou Li:
Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis. CoRR abs/2501.06467 (2025) - [i237]Xianghu Yue, Yiming Chen, Xueyi Zhang, Xiaoxue Gao, Mengling Feng, Mingrui Lao, Huiping Zhuang, Haizhou Li:
PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning. CoRR abs/2501.09352 (2025) - [i236]Xuerui Qiu, Jieyuan Zhang, Wenjie Wei, Honglin Cao, Junsheng Guo, Rui-Jie Zhu, Yimeng Shan, Yang Yang, Malu Zhang, Haizhou Li:
Quantized Spike-driven Transformer. CoRR abs/2501.13492 (2025) - [i235]Qiquan Zhang, Buddhi Wickramasinghe, Eliathamby Ambikairajah, Vidhyasaharan Sethu, Haizhou Li:
Should Audio Front-ends be Adaptive? Comparing Learnable and Adaptive Front-ends. CoRR abs/2502.03260 (2025) - [i234]Li Zhou, Ruijie Zhang, Xunlian Dai, Daniel Hershcovich, Haizhou Li:
Large Language Models Penetration in Scholarly Writing and Peer Review. CoRR abs/2502.11193 (2025) - [i233]Yuhao Zhang, Zhiheng Liu, Fan Bu, Ruiyu Zhang, Benyou Wang, Haizhou Li:
Soundwave: Less is More for Speech-Text Alignment in LLMs. CoRR abs/2502.12900 (2025) - [i232]Simin Chen, Yiming Chen, Zexin Li, Yifan Jiang, Zhongwei Wan, Yixin He, Dezhi Ran, Tianle Gu, Haizhou Li, Tao Xie, Baishakhi Ray:
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation. CoRR abs/2502.17521 (2025) - [i231]Kuang Wang, Xianfei Li, Shenghao Yang, Li Zhou, Feng Jiang, Haizhou Li:
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles. CoRR abs/2502.18968 (2025) - [i230]Yidi Jiang, Qian Chen, Shengpeng Ji, Yu Xi, Wen Wang, Chong Zhang, Xianghu Yue, Shiliang Zhang, Haizhou Li:
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook. CoRR abs/2502.20067 (2025) - [i229]Feng Jiang, Zhiyu Lin, Fan Bu, Yuhao Du, Benyou Wang, Haizhou Li:
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information. CoRR abs/2503.05085 (2025) - [i228]Wupeng Wang, Zexu Pan, Jingru Lin, Shuai Wang, Haizhou Li:
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation. CoRR abs/2503.12589 (2025) - 2024
- [j187]Qianhui Liu, Meng Ge, Haizhou Li:
Intelligent event-based lip reading word classification with spiking neural networks using spatio-temporal attention features and triplet loss. Inf. Sci. 675: 120660 (2024) - [j186]Jiaqi Yan, Qianhui Liu, Malu Zhang
, Lang Feng
, De Ma, Haizhou Li, Gang Pan:
Efficient spiking neural network design via neural architecture search. Neural Networks 173: 106172 (2024) - [j185]Xinyi Chen
, Qu Yang
, Jibin Wu
, Haizhou Li
, Kay Chen Tan
:
A Hybrid Neural Coding Approach for Pattern Recognition With Spiking Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 46(5): 3064-3078 (2024) - [j184]Shuai Wang
, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li:
Advancing speaker embedding learning: Wespeaker toolkit for research and production. Speech Commun. 162: 103104 (2024) - [j183]Jingru Lin
, Meng Ge
, Wupeng Wang
, Haizhou Li
, Mengling Feng
:
Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech. IEEE Signal Process. Lett. 31: 1014-1018 (2024) - [j182]Duo Ma
, Xianghu Yue
, Junyi Ao
, Xiaoxue Gao
, Haizhou Li
:
Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks. IEEE Signal Process. Lett. 31: 2055-2059 (2024) - [j181]Xiaoxue Gao
, Zexin Li
, Yiming Chen, Cong Liu, Haizhou Li
:
Transferable Adversarial Attacks Against ASR. IEEE Signal Process. Lett. 31: 2200-2204 (2024) - [j180]Rui Liu
, Haolin Zuo
, Zheng Lian
, Björn W. Schuller
, Haizhou Li
:
Contrastive Learning Based Modality-Invariant Feature Acquisition for Robust Multimodal Emotion Recognition With Missing Modalities. IEEE Trans. Affect. Comput. 15(4): 1856-1873 (2024) - [j179]Qu Yang
, Malu Zhang
, Jibin Wu
, Kay Chen Tan
, Haizhou Li
:
LC-TTFS: Toward Lossless Network Conversion for Spiking Neural Networks With TTFS Coding. IEEE Trans. Cogn. Dev. Syst. 16(5): 1626-1639 (2024) - [j178]Siqi Cai
, Ran Zhang
, Malu Zhang
, Jibin Wu
, Haizhou Li
:
EEG-Based Auditory Attention Detection With Spiking Graph Convolutional Network. IEEE Trans. Cogn. Dev. Syst. 16(5): 1698-1706 (2024) - [j177]Koichiro Yoshino
, Yun-Nung Chen
, Paul A. Crook
, Satwik Kottur, Jinchao Li, Behnam Hedayatnia, Seungwhan Moon, Zhengcong Fei, Zekang Li
, Jinchao Zhang, Yang Feng
, Jie Zhou
, Seokhwan Kim
, Yang Liu, Di Jin
, Alexandros Papangelis, Karthik Gopalakrishnan, Dilek Hakkani-Tur
, Babak Damavandi, Alborz Geramifard, Chiori Hori
, Ankit Shah, Chen Zhang, Haizhou Li
, João Sedoc, Luis F. D'Haro
, Rafael E. Banchs, Alexander Rudnicky
:
Overview of the Tenth Dialog System Technology Challenge: DSTC10. IEEE ACM Trans. Audio Speech Lang. Process. 32: 765-778 (2024) - [j176]Lei Liu, Li Liu, Haizhou Li
:
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1559-1572 (2024) - [j175]Xuehao Zhou
, Mingyang Zhang
, Yi Zhou
, Zhizheng Wu
, Haizhou Li
:
Accented Text-to-Speech Synthesis With Limited Data. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1699-1711 (2024) - [j174]Rui Liu
, Berrak Sisman
, Guanglai Gao
, Haizhou Li
:
Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2188-2201 (2024) - [j173]Tianchi Liu
, Kong Aik Lee
, Qiongqiong Wang
, Haizhou Li
:
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2324-2337 (2024) - [j172]Congcong Sun
, Hui Tian
, Peng Tian, Haizhou Li
, Zhenxing Qian
:
Multi-Agent Deep Learning for the Detection of Multiple Speech Steganography Methods. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2957-2972 (2024) - [j171]Mingyang Zhang
, Yi Zhou, Yi Ren, Chen Zhang
, Xiang Yin, Haizhou Li
:
RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4146-4156 (2024) - [j170]Wupeng Wang
, Zexu Pan
, Xinke Li, Shuai Wang
, Haizhou Li
:
Speech Separation With Pretrained Frontend to Minimize Domain Mismatch. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4184-4198 (2024) - [j169]Zexu Pan
, Marvin Borsdorf
, Siqi Cai
, Tanja Schultz
, Haizhou Li
:
NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4456-4470 (2024) - [j168]Yicheng Gu
, Xueyao Zhang, Liumeng Xue, Haizhou Li
, Zhizheng Wu
:
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoders. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4569-4579 (2024) - [j167]Shuai Wang
, Zhengyang Chen, Kong Aik Lee
, Yanmin Qian
, Haizhou Li
:
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4971-4998 (2024) - [j166]Siqi Cai
, Tanja Schultz
, Haizhou Li
:
Brain Topology Modeling With EEG-Graphs for Auditory Spatial Attention Detection. IEEE Trans. Biomed. Eng. 71(1): 171-182 (2024) - [j165]Miao Liu
, Jing Wang
, Xinyuan Qian
, Haizhou Li
:
Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss. IEEE Trans. Circuits Syst. Video Technol. 34(8): 6937-6948 (2024) - [j164]Zhenyu Weng
, Huiping Zhuang
, Fulin Luo
, Haizhou Li
, Zhiping Lin
:
Few-Shot Contrastive Transfer Learning With Pretrained Model for Masked Face Verification. IEEE Trans. Multim. 26: 3871-3883 (2024) - [j163]Xinyuan Qian
, Wei Xue
, Qiquan Zhang
, Ruijie Tao
, Haizhou Li
:
Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech. IEEE Trans. Multim. 26: 4480-4489 (2024) - [j162]Siqi Cai
, Peiwen Li, Haizhou Li
:
A Bio-Inspired Spiking Attentional Neural Network for Attentional Selection in the Listening Brain. IEEE Trans. Neural Networks Learn. Syst. 35(12): 17387-17397 (2024) - [j161]Ruihang Ji
, Shuzhi Sam Ge
, Kai Zhao
, Haizhou Li
:
Event-Triggered Tracking Control for Nonlinear Systems With Prescribed Performance. IEEE Trans. Syst. Man Cybern. Syst. 54(6): 3547-3557 (2024) - [c737]Shimin Zhang, Qu Yang, Chenxiang Ma, Jibin Wu, Haizhou Li, Kay Chen Tan:
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling. AAAI 2024: 16838-16847 - [c736]Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li:
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling. AAAI 2024: 18698-18706 - [c735]Jiadong Wang, Zexu Pan, Malu Zhang, Robby T. Tan, Haizhou Li:
Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition. AAAI 2024: 19144-19152 - [c734]Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Malu Zhang, Haizhou Li:
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators. AAAI 2024: 19515-19524 - [c733]Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li:
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models. ACL (Findings) 2024: 1359-1375 - [c732]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Fine-Grained Quantitative Emotion Editing for Speech Generation. APSIPA 2024: 1-6 - [c731]Feng Jiang, Weihao Liu, Xiaomin Chu, Peifeng Li, Qiaoming Zhu, Haizhou Li:
Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark. LREC/COLING 2024: 495-506 - [c730]Danqing Luo, Chen Zhang, Yan Zhang, Haizhou Li:
CrossTune: Black-Box Few-Shot Classification with Label Enhancement. LREC/COLING 2024: 4185-4197 - [c729]Yaxin Fan
, Feng Jiang, Peifeng Li, Haizhou Li:
Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study. LREC/COLING 2024: 16998-17010 - [c728]Gabriel Ivucic, Saurav Pahuja, Felix Putze, Siqi Cai, Haizhou Li, Tanja Schultz:
The Impact of Cross-Validation Schemes for EEG-Based Auditory Attention Detection with Deep Neural Networks. EMBC 2024: 1-4 - [c727]Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng Jiang, Haizhou Li:
TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models. EMNLP (Findings) 2024: 8926-8946 - [c726]Yiming Chen, Xianghu Yue, Xiaoxue Gao, Chen Zhang, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li:
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models. EMNLP (Findings) 2024: 10917-10930 - [c725]Jiabao Pan, Yan Zhang, Chen Zhang, Zuozhu Liu, Hongwei Wang, Haizhou Li:
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models. EMNLP 2024: 14686-14695 - [c724]Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li:
SVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks. ICASSP 2024: 221-225 - [c723]Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li:
Spiking-Leaf: A Learnable Auditory Front-End for Spiking Neural Networks. ICASSP 2024: 226-230 - [c722]Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li:
An Empirical Study on the Impact of Positional Encoding in Transformer-Based Monaural Speech Enhancement. ICASSP 2024: 1001-1005 - [c721]Siqi Cai, Ran Zhang, Haizhou Li:
Robust Decoding of the Auditory Attention from EEG Recordings Through Graph Convolutional Networks. ICASSP 2024: 2320-2324 - [c720]Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li:
LOCSELECT: Target Speaker Localization with an Auditory Selective Hearing Mechanism. ICASSP 2024: 8696-8700 - [c719]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis. ICASSP 2024: 10601-10605 - [c718]Junjie Li, Ruijie Tao, Zexu Pan, Meng Ge, Shuai Wang, Haizhou Li:
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech. ICASSP 2024: 10666-10670 - [c717]Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li:
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition. ICASSP 2024: 10901-10905 - [c716]Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li:
Prompt-Driven Target Speech Diarization. ICASSP 2024: 11086-11090 - [c715]Yi Ma, Kong Aik Lee, Ville Hautamäki, Meng Ge, Haizhou Li:
Gradient Weighting for Speaker Verification in Extremely Low Signal-to-Noise Ratio. ICASSP 2024: 11311-11315 - [c714]Qianhui Liu, Jiaqi Yan, Malu Zhang, Gang Pan, Haizhou Li:
LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization. IJCAI 2024: 3097-3105 - [c713]Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang:
Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition. IJCAI 2024: 3160-3168 - [c712]Wenxuan Wu, Xueyuan Chen, Xixin Wu, Haizhou Li, Helen Meng:
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy. IJCNN 2024: 1-8 - [c711]Qibing Bai, Shuai Wang, Zhijun Liu, Mingyang Zhang, Wei Rao, Yannan Wang, Haizhou Li:
Diffusion-Based Method with TTS Guidance for Foreign Accent Conversion. ISCSLP 2024: 284-288 - [c710]Yifan Hu, Rui Liu, Guanglai Gao, Haizhou Li:
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis. ISCSLP 2024: 299-303 - [c709]Peng Zhao, Ruicong Wang, Zijie Lin, Zexu Pan, Haizhou Li, Xueyi Zhang:
Ensemble Deep Learning Models for EEG-Based Auditory Attention Decoding. ISCSLP 2024: 339-343 - [c708]Xianghu Yue
, Xueyi Zhang
, Yiming Chen
, Chengwei Zhang
, Mingrui Lao
, Huiping Zhuang
, Xinyuan Qian
, Haizhou Li
:
MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks. ACM Multimedia 2024: 2428-2437 - [c707]Weizhi Liu
, Yue Li
, Dongdong Lin
, Hui Tian
, Haizhou Li
:
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis. ACM Multimedia 2024: 3294-3302 - [c706]Rui Liu
, Yifan Hu
, Yi Ren
, Xiang Yin
, Haizhou Li
:
Generative Expressive Conversational Speech Synthesis. ACM Multimedia 2024: 4187-4196 - [c705]Miao Liu
, Jing Wang
, Xinyuan Qian
, Haizhou Li
:
ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers. ACM Multimedia 2024: 7094-7103 - [c704]Ruijie Tao
, Zhan Shi
, Yidi Jiang
, Duc-Tuan Truong
, Eng Siong Chng
, Massimo Alioto
, Haizhou Li
:
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization. ACM Multimedia 2024: 11342-11347 - [c703]Chuang Li, Yan Zhang, Min-Yen Kan, Haizhou Li:
UNO-DST: Leveraging Unlabelled Data in Zero-Shot Dialogue State Tracking. NAACL-HLT (Findings) 2024: 2972-2983 - [c702]Xidong Wang, Guiming Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Junying Chen, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li:
CMB: A Comprehensive Medical Benchmark in Chinese. NAACL-HLT 2024: 6184-6205 - [c701]Huang Huang
, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Mosen Alharthi, Bang An, Juncai He, Ziche Liu, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu:
AceGPT, Localizing Large Language Models in Arabic. NAACL-HLT 2024: 8139-8163 - [c700]Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu:
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words. NeurIPS 2024 - [c699]Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, Benyou Wang, Jinchao Xu:
Alignment at Pre-training! Towards Native Alignment for Arabic LLMs. NeurIPS 2024 - [c698]Xueyi Zhang, Mingrui Lao, Peng Zhao, Jun Tang, Yanming Guo, Siqi Cai, Xianghu Yue, Haizhou Li:
Language Without Borders: A Dataset and Benchmark for Code-Switching Lip Reading. NeurIPS 2024 - [c697]Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li:
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion. Odyssey 2024: 180-186 - [c696]Hongli Yang, Xinyi Chen, Junjie Li, Hao Huang, Siqi Cai, Haizhou Li:
Listen to the Speaker in Your Gaze. CIS-RAM 2024: 380-385 - [c695]Junjie Li, Ke Zhang, Shuai Wang, Haizhou Li, Man-Wai Mak, Kong Aik Lee:
On the Effectiveness of Enrollment Speech Augmentation For Target Speaker Extraction. SLT 2024: 325-332 - [c694]Dashanka De Silva, Siqi Cai, Saurav Pahuja, Tanja Schultz, Haizhou Li:
Neurospex: Neuro-Guided Speaker Extraction With Cross-Modal Fusion. SLT 2024: 341-348 - [c693]Jiahe Wang, Shuai Wang, Junjie Li, Ke Zhang, Yanmin Qian, Haizhou Li:
Enhancing Speaker Extraction Through Rectifying Target Confusion. SLT 2024: 349-356 - [c692]Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Jiaqi Li, Haorui He, Chaoren Wang, Songting Liu, Xi Chen, Junan Zhang, Zihao Fang, Haopeng Chen, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu:
Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit. SLT 2024: 879-884 - [c691]Lichuan Jiang, Jiani Zhong, Muqing Jian, Xuanzhuo Liu, Siqi Cai, Haizhou Li:
The Impact of Synchronized Visual and Auditory Attention on Human Perception. ICSR + InnoBiz 2024: 41-50 - [c690]Xinyuan Qian, Chen Lu, Yating Zhang, Kainan Chen, Haizhou Li:
Semi-supervised Speaker Localization with Gaussian-Like Pseudo-labeling. ICSR + InnoBiz 2024: 146-155 - [c689]Shuai Wang
, Pengcheng Zhu, Haizhou Li
:
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions. ICSR + InnoBiz 2024: 303-311 - [c688]Ganjun Liu, Xiaohui Hou, Meng Ge, Tao Zhang, Haizhou Li:
A Non-Intrusive Approach to Assessing Dysarthria Severity: Advancing Clinical Diagnosis. WWW (Companion Volume) 2024: 1134-1137 - [i227]Yi Ma, Kong Aik Lee, Ville Hautamäki, Meng Ge, Haizhou Li:
Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio. CoRR abs/2401.02626 (2024) - [i226]Feng Jiang, Kuang Wang, Haizhou Li:
Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System. CoRR abs/2401.09150 (2024) - [i225]Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li:
An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement. CoRR abs/2401.09686 (2024) - [i224]Xianghu Yue, Xiaohai Tian, Malu Zhang, Zhizheng Wu, Haizhou Li:
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing. CoRR abs/2401.12264 (2024) - [i223]Qianhui Liu, Jiaqi Yan, Malu Zhang, Gang Pan, Haizhou Li:
LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization. CoRR abs/2401.14652 (2024) - [i222]Lei Liu, Li Liu, Haizhou Li:
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition. CoRR abs/2401.17604 (2024) - [i221]Wenjie Wei, Malu Zhang, Jilin Zhang, Ammar Belatreche, Jibin Wu, Zijing Xu, Xuerui Qiu, Hong Chen, Yang Yang, Haizhou Li:
Event-Driven Learning for Spiking Neural Networks. CoRR abs/2403.00270 (2024) - [i220]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Fine-Grained Quantitative Emotion Editing for Speech Generation. CoRR abs/2403.02002 (2024) - [i219]Xidong Wang, Nuo Chen, Junyin Chen, Yan Hu, Yidong Wang, Xiangbo Wu, Anningzhe Gao, Xiang Wan, Haizhou Li, Benyou Wang:
Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People. CoRR abs/2403.03640 (2024) - [i218]Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li:
sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks. CoRR abs/2403.05772 (2024) - [i217]Danqing Luo, Chen Zhang, Yan Zhang, Haizhou Li:
CrossTune: Black-Box Few-Shot Classification with Label Enhancement. CoRR abs/2403.12468 (2024) - [i216]Wenxuan Wu, Xueyuan Chen, Xixin Wu, Haizhou Li, Helen Meng:
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy. CoRR abs/2403.16078 (2024) - [i215]Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu:
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder. CoRR abs/2404.17161 (2024) - [i214]Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li:
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention. CoRR abs/2404.18501 (2024) - [i213]Chuang Li, Yang Deng
, Hengchang Hu, Min-Yen Kan, Haizhou Li:
Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems. CoRR abs/2405.01868 (2024) - [i212]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis. CoRR abs/2405.09171 (2024) - [i211]Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps:
Mamba in Speech: Towards an Alternative to Self-Attention. CoRR abs/2405.12609 (2024) - [i210]Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li:
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models. CoRR abs/2405.14646 (2024) - [i209]Jiahui Xu, Feng Jiang, Anningzhe Gao, Haizhou Li:
Unsupervised Mutual Learning of Dialogue Discourse Parsing and Topic Segmentation. CoRR abs/2405.19799 (2024) - [i208]Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng Jiang, Haizhou Li:
TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models. CoRR abs/2405.20215 (2024) - [i207]Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li:
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio? CoRR abs/2406.02483 (2024) - [i206]Zhijun Liu, Shuai Wang, Sho Inoue, Qibing Bai, Haizhou Li:
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis. CoRR abs/2406.05551 (2024) - [i205]Yidi Jiang, Ruijie Tao, Zhengyang Chen, Yanmin Qian, Haizhou Li:
Target Speech Diarization with Multimodal Prompts. CoRR abs/2406.07198 (2024) - [i204]Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhiwu Li, Haizhou Li:
Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis. CoRR abs/2406.10844 (2024) - [i203]Zeyang Song, Qianhui Liu, Qu Yang, Yizhou Peng, Haizhou Li:
ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting. CoRR abs/2406.12726 (2024) - [i202]Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu:
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words. CoRR abs/2406.13340 (2024) - [i201]Ziche Liu, Rui Ke, Feng Jiang, Haizhou Li:
Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models. CoRR abs/2406.14115 (2024) - [i200]Jiabao Pan, Yan Zhang, Chen Zhang, Zuozhu Liu, Hongwei Wang, Haizhou Li:
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models. CoRR abs/2407.01009 (2024) - [i199]Rui Liu, Haolin Zuo, Zheng Lian, Xiaofen Xing, Björn W. Schuller, Haizhou Li:
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset. CoRR abs/2407.02751 (2024) - [i198]Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang:
Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition. CoRR abs/2407.09521 (2024) - [i197]Weizhi Liu, Yue Li, Dongdong Lin, Hui Tian, Haizhou Li:
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis. CoRR abs/2407.10471 (2024) - [i196]Shuai Wang, Zhengyang Chen, Kong Aik Lee, Yanmin Qian, Haizhou Li:
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning. CoRR abs/2407.15188 (2024) - [i195]Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li:
Generative Expressive Conversational Speech Synthesis. CoRR abs/2407.21491 (2024) - [i194]Qianhui Liu, Jiadong Wang, Yang Wang, Xin Yang, Gang Pan, Haizhou Li:
Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing. CoRR abs/2408.16564 (2024) - [i193]Dashanka De Silva, Siqi Cai, Saurav Pahuja, Tanja Schultz, Haizhou Li:
NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention. CoRR abs/2409.02489 (2024) - [i192]Xinyuan Qian, Xianghu Yue, Jiadong Wang, Huiping Zhuang, Haizhou Li:
Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection. CoRR abs/2409.07224 (2024) - [i191]Zhijun Liu, Shuai Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li:
E1 TTS: Simple and Fast Non-Autoregressive TTS. CoRR abs/2409.09351 (2024) - [i190]Sho Inoue, Shuai Wang, Wanxing Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li:
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion. CoRR abs/2409.09352 (2024) - [i189]Junjie Li, Ke Zhang, Shuai Wang, Haizhou Li, Man-Wai Mak, Kong Aik Lee:
On the effectiveness of enrollment speech augmentation for Target Speaker Extraction. CoRR abs/2409.09589 (2024) - [i188]Chen Zhang, Dading Chong, Feng Jiang, Chengguang Tang, Anningzhe Gao, Guohua Tang, Haizhou Li:
Aligning Language Models Using Follow-up Likelihood as Reward Signal. CoRR abs/2409.13948 (2024) - [i187]Shuai Wang, Pengcheng Zhu, Haizhou Li:
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions. CoRR abs/2409.15782 (2024) - [i186]Shuai Wang, Ke Zhang, Shaoxiong Lin, Junjie Li, Xuefei Wang, Meng Ge, Jianwei Yu, Yanmin Qian, Haizhou Li:
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction. CoRR abs/2409.15799 (2024) - [i185]Yiming Chen, Xianghu Yue, Xiaoxue Gao, Chen Zhang, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li:
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models. CoRR abs/2409.18680 (2024) - [i184]Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li:
FluentEditor+: Text-based Speech Editing by Modeling Local Hierarchical Acoustic Smoothness and Global Prosody Consistency. CoRR abs/2410.03719 (2024) - [i183]Rui Liu, Zhenqi Jia, Jie Yang, Yifan Hu, Haizhou Li:
Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling. CoRR abs/2410.09524 (2024) - [i182]Fan Bu, Yuhao Zhang, Xidong Wang, Benyou Wang, Qun Liu, Haizhou Li:
Roadmap towards Superhuman Speech Understanding using Large Language Models. CoRR abs/2410.13268 (2024) - [i181]Shuwei He, Rui Liu, Haizhou Li:
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech. CoRR abs/2410.14101 (2024) - [i180]Zihao Cheng, Li Zhou, Feng Jiang, Benyou Wang, Haizhou Li:
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement. CoRR abs/2410.14259 (2024) - [i179]Ke Zhang, Junjie Li, Shuai Wang, Yangjie Wei, Yi Wang, Yannan Wang, Haizhou Li:
Multi-Level Speaker Representation for Target Speaker Extraction. CoRR abs/2410.16059 (2024) - [i178]Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li:
VoiceBench: Benchmarking LLM-Based Voice Assistants. CoRR abs/2410.17196 (2024) - [i177]Wupeng Wang, Zexu Pan, Xinke Li, Shuai Wang, Haizhou Li:
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch. CoRR abs/2411.03085 (2024) - [i176]Xinyuan Qian, Jiaran Gao, Yaodan Zhang, Qiquan Zhang, Hexin Liu, Leibny Paola García, Haizhou Li:
SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model. CoRR abs/2411.07751 (2024) - [i175]Xiaoxue Gao, Zexin Li, Yiming Chen, Cong Liu, Haizhou Li:
Transferable Adversarial Attacks against ASR. CoRR abs/2411.09220 (2024) - [i174]Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, Benyou Wang, Jinchao Xu:
Alignment at Pre-training! Towards Native Alignment for Arabic LLMs. CoRR abs/2412.03253 (2024) - [i173]Junjie Li, Ke Zhang, Shuai Wang, Kong Aik Lee, Haizhou Li:
MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues. CoRR abs/2412.08247 (2024) - [i172]Rui Liu, Shuwei He, Yifan Hu, Haizhou Li:
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech. CoRR abs/2412.11409 (2024) - [i171]Jianqing Zhu, Huang Huang, Zhihang Lin, Juhao Liang, Zhengyang Tang, Khalid Almubarak, Abdulmohsen Alharthik, Bang An, Juncai He, Xiangbo Wu, Fei Yu, Junying Chen, Zhuoheng Ma, Yuhao Du, He Zhang, Emad A. Alghamdi, Lian Zhang, Ruoyu Sun, Haizhou Li, Benyou Wang, Jinchao Xu:
Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion. CoRR abs/2412.12310 (2024) - [i170]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Hierarchical Control of Emotion Rendering in Speech Synthesis. CoRR abs/2412.12498 (2024) - [i169]Chenyu Yang, Shuai Wang, Hangting Chen, Jianwei Yu, Wei Tan, Rongzhi Gu, Yaoxun Xu, Yizhi Zhou, Haina Zhu, Haizhou Li:
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor. CoRR abs/2412.13786 (2024) - 2023
- [j160]Tao Luo, Weng-Fai Wong, Rick Siow Mong Goh, Anh Tuan Do, Zhixian Chen, Haizhou Li, Wenyu Jiang, Weiyun Yau:
Achieving Green AI with Energy-Efficient Deep Learning Using Neuromorphic Computing. Commun. ACM 66(7): 52-57 (2023) - [j159]Buddhi Wickramasinghe
, Eliathamby Ambikairajah
, Vidhyasaharan Sethu
, Julien Epps, Haizhou Li, Ting Dang
:
DNN controlled adaptive front-end for replay attack detection systems. Speech Commun. 154: 102973 (2023) - [j158]Tingting Wang
, Zexu Pan, Meng Ge
, Zhen Yang
, Haizhou Li
:
Time-Domain Speech Separation Networks With Graph Encoding Auxiliary. IEEE Signal Process. Lett. 30: 110-114 (2023) - [j157]Yi Zhou
, Zhizheng Wu
, Mingyang Zhang
, Xiaohai Tian
, Haizhou Li
:
TTS-Guided Training for Accent Conversion Without Parallel Data. IEEE Signal Process. Lett. 30: 533-537 (2023) - [j156]Mingyang Zhang
, Xuehao Zhou, Zhizheng Wu
, Haizhou Li:
Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis. IEEE Signal Process. Lett. 30: 947-951 (2023) - [j155]Kun Zhou
, Berrak Sisman
, Rajib Rana
, Björn W. Schuller
, Haizhou Li
:
Emotion Intensity and its Control for Emotional Voice Conversion. IEEE Trans. Affect. Comput. 14(1): 31-48 (2023) - [j154]Kun Zhou
, Berrak Sisman
, Rajib Rana
, Björn W. Schuller
, Haizhou Li
:
Speech Synthesis With Mixed Emotions. IEEE Trans. Affect. Comput. 14(4): 3120-3134 (2023) - [j153]Hui Tian
, Yiqin Qiu
, Wojciech Mazurczyk
, Haizhou Li
, Zhenxing Qian
:
STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams. IEEE ACM Trans. Audio Speech Lang. Process. 31: 277-289 (2023) - [j152]Qiquan Zhang
, Xinyuan Qian
, Zhaoheng Ni, Aaron Nicolson, Eliathamby Ambikairajah
, Haizhou Li:
A Time-Frequency Attention Module for Neural Speech Enhancement. IEEE ACM Trans. Audio Speech Lang. Process. 31: 462-475 (2023) - [j151]Xinyuan Qian
, Zhengdong Wang, Jiadong Wang
, Guohui Guan, Haizhou Li
:
Audio-Visual Cross-Attention Network for Robotic Speaker Tracking. IEEE ACM Trans. Audio Speech Lang. Process. 31: 550-562 (2023) - [j150]Chen Zhang
, Luis Fernando D'Haro
, Qiquan Zhang
, Thomas Friedrichs, Haizhou Li
:
PoE: A Panel of Experts for Generalized Automatic Dialogue Assessment. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1234-1250 (2023) - [j149]Ruijie Tao
, Kong Aik Lee
, Rohan Kumar Das
, Ville Hautamäki, Haizhou Li:
Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1706-1719 (2023) - [j148]Yi Zhou
, Zhizheng Wu
, Xiaohai Tian
, Haizhou Li
:
Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1916-1926 (2023) - [j147]Xiaoxue Gao
, Chitralekha Gupta
, Haizhou Li:
PoLyScriber: Integrated Fine-Tuning of Extractor and Lyrics Transcriber for Polyphonic Music. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1968-1981 (2023) - [j146]Zhenyu Weng
, Huiping Zhuang
, Haizhou Li
, Balakrishnan Ramalingam
, Rajesh Elara Mohan
, Zhiping Lin
:
Online Multi-Face Tracking With Multi-Modality Cascaded Matching. IEEE Trans. Circuits Syst. Video Technol. 33(6): 2738-2752 (2023) - [j145]Yiqin Qiu
, Hui Tian
, Haizhou Li
, Chin-Chen Chang
, Athanasios V. Vasilakos
:
Separable Convolution Network With Dual-Stream Pyramid Enhanced Strategy for Speech Steganalysis. IEEE Trans. Inf. Forensics Secur. 18: 2737-2750 (2023) - [j144]Jibin Wu
, Yansong Chua
, Malu Zhang
, Guoqi Li
, Haizhou Li
, Kay Chen Tan
:
A Tandem Learning Rule for Effective Training and Rapid Inference of Deep Spiking Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 34(1): 446-460 (2023) - [c687]Yiming Chen, Simin Chen, Zexin Li
, Wei Yang, Cong Liu, Robby T. Tan, Haizhou Li:
Dynamic Transformers Provide a False Sense of Efficiency. ACL (1) 2023: 7164-7180 - [c686]Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li:
Zero-shot multi-speaker accent TTS with limited accent data. APSIPA ASC 2023: 1931-1936 - [c685]Jiawei Du, Yidi Jiang, Vincent Y. F. Tan, Joey Tianyi Zhou, Haizhou Li:
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation. CVPR 2023: 3749-3758 - [c684]Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li:
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert. CVPR 2023: 14653-14662 - [c683]Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li:
ADD 2023: the Second Audio Deepfake Detection Challenge. DADA@IJCAI 2023: 125-130 - [c682]Siqi Cai, Jia Li, Hongmeng Yang, Haizhou Li:
RGCnet: An Efficient Recursive Gated Convolutional Network for EEG-based Auditory Attention Detection. EMBC 2023: 1-4 - [c681]Chen Zhang
, Luis F. D'Haro, Chengguang Tang, Ke Shi, Guohua Tang, Haizhou Li:
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark. EMNLP (Findings) 2023: 5579-5601 - [c680]Yan Zhang, Zhaopeng Feng, Zhiyang Teng, Zuozhu Liu, Haizhou Li:
How Well Do Text Embedding Models Understand Syntax? EMNLP (Findings) 2023: 9717-9728 - [c679]Hongbo Zhang, Junying Chen, Feng Jiang, Fei Yu, Zhihong Chen, Guiming Chen, Jianquan Li, Xiangbo Wu, Zhiyi Zhang, Qingying Xiao, Xiang Wan, Benyou Wang, Haizhou Li:
HuatuoGPT, Towards Taming Language Model to Be a Doctor. EMNLP (Findings) 2023: 10859-10885 - [c678]Marvin Borsdorf
, Saurav Pahuja
, Gabriel Ivucic, Siqi Cai, Haizhou Li, Tanja Schultz:
Multi-Head Attention and GRU for Improved Match-Mismatch Classification of Speech Stimulus and EEG Response. ICASSP 2023: 1-2 - [c677]Xiaoxue Gao, Xianghu Yue, Haizhou Li:
Self-Transriber: Few-Shot Lyrics Transcription With Self-Training. ICASSP 2023: 1-5 - [c676]Zexu Pan, Wupeng Wang, Marvin Borsdorf
, Haizhou Li:
ImagineNet: Target Speaker Extraction with Intermittent Visual Cue Through Embedding Inpainting. ICASSP 2023: 1-5 - [c675]Ruijie Tao, Kong Aik Lee, Zhan Shi, Haizhou Li:
Speaker Recognition with Two-Step Multi-Modal Deep Cleansing. ICASSP 2023: 1-5 - [c674]Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li:
Token2vec: A Joint Self-Supervised Pre-Training Framework Using Unpaired Speech and Text. ICASSP 2023: 1-5 - [c673]Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li:
Ripple Sparse Self-Attention for Monaural Speech Enhancement. ICASSP 2023: 1-5 - [c672]Haolin Zuo, Rui Liu, Jinming Zhao, Guanglai Gao, Haizhou Li:
Exploiting Modality-Invariant Feature for Robust Multimodal Emotion Recognition with Missing Modalities. ICASSP 2023: 1-5 - [c671]Yuke Si, Yan Zhang, Yuhang Li, Xiaobao Wang, Longbiao Wang, Jianwu Dang, Eng Siong Chng, Haizhou Li:
Local and Global Context Modeling with Relation Matching Task for Dialog Act Recognition. IJCNN 2023: 1-8 - [c670]Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li:
Explicit Intensity Control for Accented Text-to-speech. INTERSPEECH 2023: 22-26 - [c669]Ruicong Wang, Siqi Cai, Haizhou Li:
EEG-based Auditory Attention Detection with Spatiotemporal Graph and Graph Convolutional Network. INTERSPEECH 2023: 1144-1148 - [c668]Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li:
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning. INTERSPEECH 2023: 2978-2982 - [c667]Jingru Lin, Xianghu Yue, Junyi Ao, Haizhou Li:
Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder. INTERSPEECH 2023: 2988-2992 - [c666]Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li:
Target Active Speaker Detection with Audio-visual Cues. INTERSPEECH 2023: 3152-3156 - [c665]Ke Zhang, Marvin Borsdorf
, Zexu Pan, Haizhou Li, Yangjie Wei, Yi Wang:
Speaker Extraction with Detection of Presence and Absence of Target Speakers. INTERSPEECH 2023: 3714-3718 - [c664]Qinghua Liu, Meng Ge, Zhizheng Wu, Haizhou Li:
PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network. INTERSPEECH 2023: 3719-3723 - [c663]Rui Liu, Jinhua Zhang, Guanglai Gao, Haizhou Li:
Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion. INTERSPEECH 2023: 3999-4003 - [c662]Junchen Lu
, Berrak Sisman, Mingyang Zhang, Haizhou Li:
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units. INTERSPEECH 2023: 5536-5540 - [c661]Chuang Li, Hengchang Hu, Yan Zhang, Min-Yen Kan, Haizhou Li:
A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommendation Systems. KaRS@RecSys 2023: 7-20 - [c660]Xueyi Zhang
, Chengwei Zhang
, Tao Wang
, Jun Tang
, Songyang Lao
, Haizhou Li
:
Slow-Fast Time Parameter Aggregation Network for Class-Incremental Lip Reading. ACM Multimedia 2023: 747-756 - [c659]Saurav Pahuja
, Siqi Cai, Tanja Schultz, Haizhou Li:
XAnet: Cross-Attention Between EEG of Left and Right Brain for Auditory Attention Decoding. NER 2023: 1-4 - [c658]Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li:
Disentangling Voice and Content with Self-Supervision for Speaker Recognition. NeurIPS 2023 - [c657]Yaxin Fan, Feng Jiang, Peifeng Li, Haizhou Li:
GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning. NLPCC (3) 2023: 69-80 - [c656]Bin Wang, Haizhou Li:
Relational Sentence Embedding for Flexible Semantic Matching. RepL4NLP@ACL 2023: 238-252 - [c655]Saurav Pahuja
, Gabriel Ivucic, Felix Putze, Siqi Cai, Haizhou Li, Tanja Schultz:
Enhancing Subject-Independent EEG-Based Auditory Attention Decoding with WGAN and Pearson Correlation Coefficient. SMC 2023: 3715-3720 - [e24]Jianhua Tao, Haizhou Li, Jiangyan Yi, Cunhang Fan:
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), Macao, China, August 19, 2023. CEUR Workshop Proceedings 3597, CEUR-WS.org 2023 [contents] - [i168]Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li:
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert. CoRR abs/2303.17480 (2023) - [i167]Zhihong Chen, Feng Jiang, Junying Chen, Tiannan Wang, Fei Yu, Guiming Chen, Hongbo Zhang, Juhao Liang, Chen Zhang, Zhiyi Zhang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li:
Phoenix: Democratizing ChatGPT across Languages. CoRR abs/2304.10453 (2023) - [i166]Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li:
Accented Text-to-Speech Synthesis with Limited Data. CoRR abs/2305.04816 (2023) - [i165]Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li:
Ripple sparse self-attention for monaural speech enhancement. CoRR abs/2305.08541 (2023) - [i164]Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby T. Tan, Haizhou Li:
Dynamic Transformers Provide a False Sense of Efficiency. CoRR abs/2305.12228 (2023) - [i163]Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li:
Target Active Speaker Detection with Audio-visual Cues. CoRR abs/2305.12831 (2023) - [i162]Feng Jiang, Longwang He, Peifeng Li, Qiaoming Zhu, Haizhou Li:
Topic-driven Distant Supervision Framework for Macro-level Discourse Parsing. CoRR abs/2305.13755 (2023) - [i161]Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li:
ADD 2023: the Second Audio Deepfake Detection Challenge. CoRR abs/2305.13774 (2023) - [i160]Danqing Luo, Chen Zhang, Jiahui Xu, Bin Wang
, Yiming Chen, Yan Zhang, Haizhou Li:
Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data Augmentation. CoRR abs/2305.13785 (2023) - [i159]Feng Jiang, Weihao Liu, Xiaomin Chu, Peifeng Li, Qiaoming Zhu, Haizhou Li:
Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark. CoRR abs/2305.14790 (2023) - [i158]Hongbo Zhang, Junying Chen, Feng Jiang, Fei Yu, Zhihong Chen, Jianquan Li, Guiming Chen, Xiangbo Wu, Zhiyi Zhang, Qingying Xiao, Xiang Wan, Benyou Wang, Haizhou Li:
HuatuoGPT, towards Taming Language Model to Be a Doctor. CoRR abs/2305.15075 (2023) - [i157]Rui Liu, Jinhua Zhang, Guanglai Gao, Haizhou Li:
Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion. CoRR abs/2305.16353 (2023) - [i156]Xinyi Chen, Qu Yang, Jibin Wu, Haizhou Li, Kay Chen Tan:
A Hybrid Neural Coding Approach for Pattern Recognition with Spiking Neural Networks. CoRR abs/2305.16594 (2023) - [i155]Zhenyu Weng, Huiping Zhuang, Haizhou Li, Zhiping Lin:
Constant Sequence Extension for Fast Search Using Weighted Hamming Distance. CoRR abs/2306.03612 (2023) - [i154]Junchen Lu, Berrak Sisman, Mingyang Zhang, Haizhou Li:
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units. CoRR abs/2306.17005 (2023) - [i153]Shimin Zhang, Qu Yang, Chenxiang Ma, Jibin Wu, Haizhou Li, Kay Chen Tan:
Long Short-term Memory with Two-Compartment Spiking Neuron. CoRR abs/2307.07231 (2023) - [i152]Lingyi Yang, Feng Jiang, Haizhou Li:
Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text. CoRR abs/2307.11380 (2023) - [i151]Yaxin Fan, Feng Jiang, Peifeng Li, Haizhou Li:
GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning. CoRR abs/2307.13923 (2023) - [i150]Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li:
CMB: A Comprehensive Medical Benchmark in Chinese. CoRR abs/2308.08833 (2023) - [i149]Shimin Zhang, Qu Yang, Chenxiang Ma, Jibin Wu, Haizhou Li, Kay Chen Tan:
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-term Sequential Modelling. CoRR abs/2308.13250 (2023) - [i148]Hongxu Zhu, Siqi Cai, Yidi Jiang, Qiquan Zhang, Haizhou Li:
EEG-Derived Voice Signature for Attended Speaker Detection. CoRR abs/2308.14774 (2023) - [i147]Qinghua Liu, Meng Ge, Zhizheng Wu, Haizhou Li:
PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network. CoRR abs/2309.06723 (2023) - [i146]Chuang Li, Hengchang Hu, Yan Zhang, Min-Yen Kan, Haizhou Li:
A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems. CoRR abs/2309.07682 (2023) - [i145]Junjie Li, Ruijie Tao, Zexu Pan, Meng Ge, Shuai Wang, Haizhou Li:
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech. CoRR abs/2309.08408 (2023) - [i144]Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li:
Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks. CoRR abs/2309.09469 (2023) - [i143]Junyi Ao, Mehmet Sinan Yildirim, Meng Ge, Shuai Wang, Ruijie Tao, Yanmin Qian, Liqun Deng, Longshuai Xiao, Haizhou Li:
USED: Universal Speaker Extraction and Diarization. CoRR abs/2309.10674 (2023) - [i142]Rui Liu, Bin Liu, Haizhou Li:
Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech. CoRR abs/2309.11724 (2023) - [i141]Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li:
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency. CoRR abs/2309.11725 (2023) - [i140]Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li:
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition. CoRR abs/2309.11730 (2023) - [i139]Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu:
AceGPT, Localizing Large Language Models in Arabic. CoRR abs/2309.12053 (2023) - [i138]Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li:
Disentangling Voice and Content with Self-Supervision for Speaker Recognition. CoRR abs/2310.01128 (2023) - [i137]Chen Zhang, Luis Fernando D'Haro, Chengguang Tang, Ke Shi, Guohua Tang, Haizhou Li:
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark. CoRR abs/2310.08958 (2023) - [i136]Chuang Li, Yan Zhang, Min-Yen Kan, Haizhou Li:
UNO-DST: Leveraging Unlabelled Data in Zero-Shot Dialogue State Tracking. CoRR abs/2310.10492 (2023) - [i135]Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li:
LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism. CoRR abs/2310.10497 (2023) - [i134]Yaxin Fan, Feng Jiang, Peifeng Li, Haizhou Li:
Quantify Health-Related Atomic Knowledge in Chinese Medical Large Language Models: A Computational Analysis. CoRR abs/2310.11722 (2023) - [i133]Qu Yang, Malu Zhang, Jibin Wu, Kay Chen Tan, Haizhou Li:
LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS Coding. CoRR abs/2310.14978 (2023) - [i132]Yan Zhang, Zhaopeng Feng
, Zhiyang Teng, Zuozhu Liu, Haizhou Li:
How Well Do Text Embedding Models Understand Syntax? CoRR abs/2311.07996 (2023) - [i131]Junying Chen, Xidong Wang, Anningzhe Gao, Feng Jiang, Shunian Chen, Hongbo Zhang, Dingjie Song, Wenya Xie, Chuyi Kong, Jianquan Li, Xiang Wan, Haizhou Li, Benyou Wang:
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs. CoRR abs/2311.09774 (2023) - [i130]Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li:
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification. CoRR abs/2312.03620 (2023) - [i129]Xueyao Zhang, Liumeng Xue, Yuancheng Wang, Yicheng Gu, Xi Chen, Zihao Fang, Haopeng Chen, Lexiao Zou, Chaoren Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu:
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit. CoRR abs/2312.09911 (2023) - [i128]Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li:
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling. CoRR abs/2312.11947 (2023) - [i127]Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Malu Zhang, Haizhou Li:
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators. CoRR abs/2312.15407 (2023) - [i126]Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng:
The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge. CoRR abs/2312.16002 (2023) - 2022
- [j143]Xianghu Yue
, Jingru Lin, Fabian Ritter Gutierrez, Haizhou Li:
Self-Supervised Learning With Segmental Masking for Speech Representation. IEEE J. Sel. Top. Signal Process. 16(6): 1367-1379 (2022) - [j142]Hongqiang Du
, Lei Xie, Haizhou Li:
Noise-robust voice conversion with domain adversarial training. Neural Networks 148: 74-84 (2022) - [j141]Jibin Wu
, Chenglin Xu, Xiao Han, Daquan Zhou
, Malu Zhang
, Haizhou Li
, Kay Chen Tan
:
Progressive Tandem Learning for Pattern Recognition With Deep Spiking Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 44(11): 7824-7840 (2022) - [j140]Kun Zhou
, Berrak Sisman
, Rui Liu
, Haizhou Li
:
Emotional voice conversion: Theory, databases and ESD. Speech Commun. 137: 1-18 (2022) - [j139]Hongning Zhu
, Kong Aik Lee
, Haizhou Li
:
Discriminative speaker embedding with serialized multi-layer multi-head attention. Speech Commun. 144: 89-100 (2022) - [j138]Tianchi Liu
, Rohan Kumar Das
, Kong Aik Lee
, Haizhou Li
:
Neural Acoustic-Phonetic Approach for Speaker Verification With Phonetic Attention Mask. IEEE Signal Process. Lett. 29: 782-786 (2022) - [j137]Zexu Pan
, Xinyuan Qian
, Haizhou Li
:
Speaker Extraction With Co-Speech Gestures Cue. IEEE Signal Process. Lett. 29: 1467-1471 (2022) - [j136]Haizhou Li:
A Unique ICASSP 2022: During an Unusual Time [Conference Highlights]. IEEE Signal Process. Mag. 39(2): 159-160 (2022) - [j135]Zexu Pan
, Ruijie Tao, Chenglin Xu
, Haizhou Li
:
Selective Listening by Synchronizing Speech With Lips. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1650-1664 (2022) - [j134]Rui Liu
, Berrak Sisman
, Guanglai Gao, Haizhou Li
:
Decoding Knowledge Transfer for Neural Text-to-Speech Training. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1789-1802 (2022) - [j133]Xiaoxue Gao
, Chitralekha Gupta
, Haizhou Li
:
Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task Learning. IEEE ACM Trans. Audio Speech Lang. Process. 30: 2280-2294 (2022) - [j132]Chitralekha Gupta
, Haizhou Li
, Masataka Goto
:
Deep Learning Approaches in Topics of Singing Information Processing. IEEE ACM Trans. Audio Speech Lang. Process. 30: 2422-2451 (2022) - [j131]Zexu Pan
, Meng Ge
, Haizhou Li
:
USEV: Universal Speaker Extraction With Visual Cue. IEEE ACM Trans. Audio Speech Lang. Process. 30: 3032-3045 (2022) - [j130]Enze Su
, Siqi Cai
, Longhan Xie
, Haizhou Li
, Tanja Schultz
:
STAnet: A Spatiotemporal Attention Network for Decoding Auditory Spatial Attention From EEG. IEEE Trans. Biomed. Eng. 69(7): 2233-2242 (2022) - [j129]Siqi Cai
, Enze Su
, Longhan Xie
, Haizhou Li
:
EEG-Based Auditory Attention Detection via Frequency and Channel Neural Attention. IEEE Trans. Hum. Mach. Syst. 52(2): 256-266 (2022) - [j128]Malu Zhang
, Jiadong Wang
, Jibin Wu
, Ammar Belatreche
, Burin Amornpaisannon, Zhixuan Zhang
, Venkata Pavan Kumar Miriyala
, Hong Qu
, Yansong Chua
, Trevor E. Carlson
, Haizhou Li
:
Rectified Linear Postsynaptic Potential Function for Backpropagation in Deep Spiking Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 33(5): 1947-1958 (2022) - [c654]Chen Zhang
, Luis Fernando D'Haro, Thomas Friedrichs, Haizhou Li:
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation. AAAI 2022: 11657-11666 - [c653]Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang, Haizhou Li:
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database. ACL (1) 2022: 5699-5710 - [c652]Bin Wang, C.-C. Jay Kuo, Haizhou Li:
Just Rank: Rethinking Evaluation with Word and Sentence Similarities. ACL (1) 2022: 6060-6077 - [c651]Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray
, Mengmeng Xu, Eric Zhongcong Xu
, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina González, James Hillis, Xuhua Huang
, Yifei Huang, Wenqi Jia, Weslie Khoo, Jáchym Kolár, Satwik Kottur, Anurag Kumar, Federico Landini
, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova
, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbeláez
, David Crandall, Dima Damen
, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem
, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard A. Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik:
Ego4D: Around the World in 3, 000 Hours of Egocentric Video. CVPR 2022: 18973-18990 - [c650]Chen Zhang
, Luis Fernando D'Haro, Qiquan Zhang, Thomas Friedrichs, Haizhou Li:
FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation. EMNLP 2022: 3336-3355 - [c649]Bin Wang, Chen Zhang, Yan Zhang, Yiming Chen, Haizhou Li:
Analyzing and Evaluating Faithfulness in Dialogue Summarization. EMNLP 2022: 4897-4908 - [c648]Yiming Chen, Yan Zhang, Bin Wang, Zuozhu Liu, Haizhou Li:
Generate, Discriminate and Contrast: A Semi-Supervised Sentence Representation Learning Framework. EMNLP 2022: 8150-8161 - [c647]Xiaoxue Gao
, Chitralekha Gupta, Haizhou Li:
Genre-Conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music. ICASSP 2022: 791-795 - [c646]Marvin Borsdorf
, Kevin Scheck, Haizhou Li, Tanja Schultz
:
Experts Versus All-Rounders: Target Language Extraction for Multiple Target Languages. ICASSP 2022: 846-850 - [c645]Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang
, Haizhou Li:
Memobert: Pre-Training Model with Prompt-Based Learning for Multimodal Emotion Recognition. ICASSP 2022: 4703-4707 - [c644]Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li:
Self-Supervised Speaker Recognition with Loss-Gated Learning. ICASSP 2022: 6142-6146 - [c643]Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li:
L-SpEx: Localized Target Speaker Extraction. ICASSP 2022: 7287-7291 - [c642]Tianchi Liu
, Rohan Kumar Das, Kong Aik Lee, Haizhou Li:
MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances. ICASSP 2022: 7517-7521 - [c641]Qiquan Zhang, Qi Song, Zhaoheng Ni, Aaron Nicolson
, Haizhou Li:
Time-Frequency Attention for Monaural Speech Enhancement. ICASSP 2022: 7852-7856 - [c640]Junchen Lu
, Berrak Sisman, Rui Liu, Mingyang Zhang, Haizhou Li:
Visualtts: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over. ICASSP 2022: 8032-8036 - [c639]Jiadong Wang, Jibin Wu, Malu Zhang, Qi Liu
, Haizhou Li:
A Hybrid Learning Framework for Deep Spiking Neural Networks with One-Spike Temporal Coding. ICASSP 2022: 8942-8946 - [c638]Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu
, Zhengqi Wen, Haizhou Li:
ADD 2022: the first Audio Deep Synthesis Detection Challenge. ICASSP 2022: 9216-9220 - [c637]Marvin Borsdorf
, Kevin Scheck, Haizhou Li, Tanja Schultz
:
Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language. INTERSPEECH 2022: 256-260 - [c636]Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li:
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT. INTERSPEECH 2022: 1686-1690 - [c635]Zexu Pan, Meng Ge, Haizhou Li:
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction. INTERSPEECH 2022: 1786-1790 - [c634]Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li:
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion. INTERSPEECH 2022: 2603-2607 - [c633]Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li
, Yao Qian, Furu Wei:
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data. INTERSPEECH 2022: 2658-2662 - [c632]Qu Yang, Qi Liu
, Haizhou Li:
Deep residual spiking neural network for keyword spotting in low-resource settings. INTERSPEECH 2022: 3023-3027 - [c631]Zeyang Song, Qi Liu
, Qu Yang, Haizhou Li:
Knowledge distillation for In-memory keyword spotting model. INTERSPEECH 2022: 4128-4132 - [c630]Rui Liu, Berrak Sisman, Björn W. Schuller, Guanglai Gao, Haizhou Li:
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning. INTERSPEECH 2022: 5493-5497 - [c629]Jianhua Tao, Jiangyan Yi, Cunhang Fan, Ruibo Fu, Shan Liang, Pengyuan Zhang, Haizhou Li, Helen Meng, Dong Yu, Masato Akagi:
DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia. ACM Multimedia 2022: 7405-7406 - [c628]Qu Yang, Jibin Wu, Malu Zhang, Yansong Chua, Xinchao Wang, Haizhou Li:
Training Spiking Neural Networks with Local Tandem Learning. NeurIPS 2022 - [c627]Peiwen Li, Enze Su, Jia Li, Siqi Cai, Longhan Xie, Haizhou Li:
Esaa: An Eeg-Speech Auditory Attention Detection Database. O-COCOSDA 2022 2022: 1-6 - [e23]Rong Tong, Yanfeng Lu, Minghui Dong, Wengao Gong, Haizhou Li:
International Conference on Asian Language Processing, IALP 2022, Singapore, October 27-28, 2022. IEEE 2022, ISBN 978-1-6654-7674-4 [contents] - [e22]Svetlana Stoyanchev, Stefan Ultes, Haizhou Li
:
Conversational AI for Natural Human-Centric Interaction - 12th International Workshop on Spoken Dialogue System Technology, IWSDS 2021, Singapore. Lecture Notes in Electrical Engineering 943, Springer 2022, ISBN 978-981-19-5537-2 [contents] - [e21]Jianhua Tao, Haizhou Li, Helen Meng, Dong Yu, Masato Akagi, Jiangyan Yi, Cunhang Fan, Ruibo Fu, Shan Lian, Pengyuan Zhang:
DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, Lisboa, Portugal, 14 October 2022. ACM 2022, ISBN 978-1-4503-9496-3 [contents] - [i125]Kun Zhou, Berrak Sisman, Rajib Rana, Björn W. Schuller, Haizhou Li:
Emotion Intensity and its Control for Emotional Voice Conversion. CoRR abs/2201.03967 (2022) - [i124]Hongqiang Du, Lei Xie, Haizhou Li:
Noise-robust voice conversion with domain adversarial training. CoRR abs/2201.10693 (2022) - [i123]Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li:
MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances. CoRR abs/2202.01624 (2022) - [i122]Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu:
ADD 2022: the First Audio Deep Synthesis Detection Challenge. CoRR abs/2202.08433 (2022) - [i121]Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li:
L-SpEx: Localized Target Speaker Extraction. CoRR abs/2202.09995 (2022) - [i120]Bin Wang
, C.-C. Jay Kuo, Haizhou Li:
Just Rank: Rethinking Evaluation with Word and Sentence Similarities. CoRR abs/2203.02679 (2022) - [i119]Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li:
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT. CoRR abs/2203.15610 (2022) - [i118]Zexu Pan, Xinyuan Qian, Haizhou Li:
Speaker Extraction with Co-Speech Gestures Cue. CoRR abs/2203.16840 (2022) - [i117]Zexu Pan, Meng Ge, Haizhou Li:
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction. CoRR abs/2203.16843 (2022) - [i116]Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li
, Yao Qian, Furu Wei:
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data. CoRR abs/2203.17113 (2022) - [i115]Xiaoxue Gao, Chitralekha Gupta, Haizhou Li:
Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music. CoRR abs/2204.03307 (2022) - [i114]Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang
, Haizhou Li:
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database. CoRR abs/2205.10237 (2022) - [i113]Rui Liu, Berrak Sisman, Björn W. Schuller, Guanglai Gao, Haizhou Li:
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning. CoRR abs/2206.07229 (2022) - [i112]Xiaoxue Gao, Chitralekha Gupta, Haizhou Li:
PoLyScribers: Joint Training of Vocal Extractor and Lyrics Transcriber for Polyphonic Music. CoRR abs/2207.07336 (2022) - [i111]Kun Zhou, Berrak Sisman, Rajib Rana, Björn W. Schuller, Haizhou Li:
Speech Synthesis with Mixed Emotions. CoRR abs/2208.05890 (2022) - [i110]Jiadong Wang, Xinyuan Qian, Haizhou Li:
Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception. CoRR abs/2209.01768 (2022) - [i109]Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li:
Controllable Accented Text-to-Speech Synthesis. CoRR abs/2209.10804 (2022) - [i108]Qutang Cai, Guoqiang Hong, Zhijian Ye, Ximin Li, Haizhou Li:
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022. CoRR abs/2209.11433 (2022) - [i107]Bin Wang
, Chen Zhang, Chengwei Wei, Haizhou Li:
A Focused Study on Sequence Length for Dialogue Summarization. CoRR abs/2209.11910 (2022) - [i106]Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li:
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning. CoRR abs/2210.04062 (2022) - [i105]Qu Yang, Jibin Wu, Malu Zhang, Yansong Chua, Xinchao Wang
, Haizhou Li:
Training Spiking Neural Networks with Local Tandem Learning. CoRR abs/2210.04532 (2022) - [i104]Bin Wang
, Chen Zhang, Yan Zhang, Yiming Chen, Haizhou Li:
Analyzing and Evaluating Faithfulness in Dialogue Summarization. CoRR abs/2210.11777 (2022) - [i103]Kun Zhou, Berrak Sisman, Carlos Busso, Haizhou Li:
Mixed Emotion Modelling for Emotional Voice Conversion. CoRR abs/2210.13756 (2022) - [i102]Chen Zhang, Luis Fernando D'Haro, Qiquan Zhang, Thomas Friedrichs, Haizhou Li:
FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation. CoRR abs/2210.13832 (2022) - [i101]Haolin Zuo, Rui Liu, Jinming Zhao, Guanglai Gao, Haizhou Li:
Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities. CoRR abs/2210.15359 (2022) - [i100]Yifan Hu, Rui Liu, Guanglai Gao, Haizhou Li:
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis. CoRR abs/2210.15360 (2022) - [i99]Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li:
Explicit Intensity Control for Accented Text-to-speech. CoRR abs/2210.15364 (2022) - [i98]Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li:
Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs. CoRR abs/2210.15385 (2022) - [i97]Ruijie Tao, Kong Aik Lee, Zhan Shi, Haizhou Li:
Speaker recognition with two-step multi-modal deep cleansing. CoRR abs/2210.15903 (2022) - [i96]Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li:
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text. CoRR abs/2210.16755 (2022) - [i95]Yiming Chen, Yan Zhang, Bin Wang
, Zuozhu Liu, Haizhou Li:
Generate, Discriminate and Contrast: A Semi-Supervised Sentence Representation Learning Framework. CoRR abs/2210.16798 (2022) - [i94]Zexu Pan, Wupeng Wang, Marvin Borsdorf, Haizhou Li:
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting. CoRR abs/2211.00109 (2022) - [i93]Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He
, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md. Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang, Luis Buera:
I4U System Description for NIST SRE'20 CTS Challenge. CoRR abs/2211.01091 (2022) - [i92]Xiaoxue Gao, Xianghu Yue, Haizhou Li:
Self-Transriber: Few-shot Lyrics Transcription with Self-training. CoRR abs/2211.10152 (2022) - [i91]Jiawei Du, Yidi Jiang, Vincent Y. F. Tan, Joey Tianyi Zhou, Haizhou Li:
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation. CoRR abs/2211.11004 (2022) - [i90]Bin Wang, Haizhou Li:
Relational Sentence Embedding for Flexible Semantic Matching. CoRR abs/2212.08802 (2022) - [i89]Chen Zhang, Luis Fernando D'Haro, Qiquan Zhang, Thomas Friedrichs, Haizhou Li:
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment. CoRR abs/2212.08992 (2022) - 2021
- [j127]Jibin Wu, Qi Liu
, Malu Zhang, Zihan Pan, Haizhou Li, Kay Chen Tan
:
HuRAI: A brain-inspired computational model for human-robot auditory interface. Neurocomputing 465: 103-113 (2021) - [j126]Rui Liu
, Berrak Sisman
, Yixing Lin, Haizhou Li:
FastTalker: A neural text-to-speech architecture with shallow and group autoregression. Neural Networks 141: 306-314 (2021) - [j125]Hongqiang Du
, Xiaohai Tian, Lei Xie, Haizhou Li
:
Factorized WaveNet for voice conversion with limited data. Speech Commun. 130: 45-54 (2021) - [j124]Tharshini Gunendradasan, Eliathamby Ambikairajah
, Julien Epps, Vidhyasaharan Sethu
, Haizhou Li:
An adaptive transmission line cochlear model based front-end for replay attack detection. Speech Commun. 132: 114-122 (2021) - [j123]Bidisha Sharma, Xiaoxue Gao, Karthika Vijayan, Xiaohai Tian, Haizhou Li
:
NHSS: A speech and singing parallel database. Speech Commun. 133: 9-22 (2021) - [j122]Xinyuan Qian
, Qi Liu
, Jiadong Wang, Haizhou Li
:
Three-Dimensional Speaker Localization: Audio-Refined Visual Scaling Factor Estimation. IEEE Signal Process. Lett. 28: 1405-1409 (2021) - [j121]Berrak Sisman
, Junichi Yamagishi
, Simon King
, Haizhou Li
:
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning. IEEE ACM Trans. Audio Speech Lang. Process. 29: 132-157 (2021) - [j120]Rui Liu
, Berrak Sisman
, Feilong Bao, Jichen Yang
, Guanglai Gao, Haizhou Li
:
Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 29: 274-285 (2021) - [j119]Mingyang Zhang
, Yi Zhou, Li Zhao, Haizhou Li
:
Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data. IEEE ACM Trans. Audio Speech Lang. Process. 29: 1290-1302 (2021) - [j118]Rui Liu
, Berrak Sisman
, Guanglai Gao, Haizhou Li
:
Expressive TTS Training With Frame and Style Reconstruction Loss. IEEE ACM Trans. Audio Speech Lang. Process. 29: 1806-1818 (2021) - [j117]Chen Zhang
, Grandee Lee
, Luis Fernando D'Haro
, Haizhou Li
:
D-Score: Holistic Dialogue Evaluation Without Reference. IEEE ACM Trans. Audio Speech Lang. Process. 29: 2502-2516 (2021) - [j116]Zihan Pan
, Malu Zhang
, Jibin Wu
, Jiadong Wang, Haizhou Li
:
Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks. IEEE ACM Trans. Audio Speech Lang. Process. 29: 2656-2670 (2021) - [j115]Chenglin Xu
, Wei Rao
, Jibin Wu
, Haizhou Li
:
Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech. IEEE ACM Trans. Audio Speech Lang. Process. 29: 2696-2709 (2021) - [j114]Yi Zhou
, Xiaohai Tian
, Haizhou Li
:
Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. IEEE ACM Trans. Audio Speech Lang. Process. 29: 3427-3439 (2021) - [c626]Yan Zhang, Ruidan He, Zuozhu Liu, Lidong Bing, Haizhou Li:
Bootstrapped Unsupervised Sentence Representation Learning. ACL/IJCNLP (1) 2021: 5168-5180 - [c625]Chen Zhang
, Yiming Chen, Luis Fernando D'Haro, Yan Zhang, Thomas Friedrichs, Grandee Lee, Haizhou Li:
DynaEval: Unifying Turn and Dialogue Level Evaluation. ACL/IJCNLP (1) 2021: 5676-5689 - [c624]Jinhu Li, Chitralekha Gupta, Haizhou Li:
Training Explainable Singing Quality Assessment Network with Augmented Data. APSIPA ASC 2021: 904-911 - [c623]Chitralekha Gupta, Jinhu Li, Haizhou Li:
Towards Reference-Independent Rhythm Assessment of Solo Singing. APSIPA ASC 2021: 912-919 - [c622]Yi Ma, Kong Aik Lee, Ville Hautamäki, Haizhou Li:
PL-EESR: Perceptual Loss Based End-to-End Robust Speaker Representation Extraction. ASRU 2021: 106-113 - [c621]Bidisha Sharma, Maulik C. Madhavi, Xuehao Zhou, Haizhou Li:
Exploring Teacher-Student Learning Approach for Multi-Lingual Speech-to-Intent Classification. ASRU 2021: 419-426 - [c620]Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li:
Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer. ASRU 2021: 594-601 - [c619]Sergey Nikonorov, Berrak Sisman, Mingyang Zhang, Haizhou Li:
DEEPA: A Deep Neural Analyzer for Speech and Singing Vocoding. ASRU 2021: 618-625 - [c618]Marvin Borsdorf
, Haizhou Li, Tanja Schultz
:
Target Language Extraction at Multilingual Cocktail Parties. ASRU 2021: 717-724 - [c617]Mingyang Zhang, Xuehao Zhou, Kun Zhou, Rui Liu, Perry Lam, Berrak Sisman, Haizhou Li:
SUTD-NUS System for Blizzard Challenge 2021. Blizzard Challenge 2021 - [c616]Enze Su, Siqi Cai, Peiwen Li, Longhan Xie, Haizhou Li:
Auditory Attention Detection with EEG Channel Attention. EMBC 2021: 5804-5807 - [c615]Siqi Cai, Pengcheng Sun, Tanja Schultz
, Haizhou Li:
Low-Latency Auditory Spatial Attention Detection Based on Spectro-Spatial Features from EEG. EMBC 2021: 5812-5815 - [c614]Yiming Chen, Yan Zhang, Chen Zhang, Grandee Lee, Ran Cheng, Haizhou Li:
Revisiting Self-training for Few-shot Learning of Language Model. EMNLP (1) 2021: 9125-9135 - [c613]Nana Hou, Chenglin Xu, Eng Siong Chng, Haizhou Li:
Learning Disentangled Feature Representations for Speech Enhancement Via Adversarial Training. ICASSP 2021: 666-670 - [c612]Kun Zhou, Berrak Sisman
, Rui Liu, Haizhou Li:
Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset. ICASSP 2021: 920-924 - [c611]Xinyuan Qian, Maulik C. Madhavi, Zexu Pan, Jiadong Wang, Haizhou Li:
Multi-Target DoA Estimation with an Audio-Visual Fusion Mechanism. ICASSP 2021: 4280-4284 - [c610]Rui Liu, Berrak Sisman
, Haizhou Li:
Graphspeech: Syntax-Aware Graph Attention Network for Neural Speech Synthesis. ICASSP 2021: 6059-6063 - [c609]Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li:
Multi-Stage Speaker Extraction with Utterance and Frame-Level Reference Signals. ICASSP 2021: 6109-6113 - [c608]Lili Guo, Longbiao Wang, Chenglin Xu, Jianwu Dang, Eng Siong Chng, Haizhou Li:
Representation Learning with Spectro-Temporal-Channel Attention for Speech Emotion Recognition. ICASSP 2021: 6304-6308 - [c607]Rohan Kumar Das, Jichen Yang, Haizhou Li:
Data Augmentation with Signal Companding for Detection of Logical Access Attacks. ICASSP 2021: 6349-6353 - [c606]Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li:
Muse: Multi-Modal Target Speaker Extraction with Visual Cues. ICASSP 2021: 6678-6682 - [c605]Bidisha Sharma, Maulik C. Madhavi, Haizhou Li:
Leveraging Acoustic and Linguistic Embeddings from Pretrained Speech and Language Models for Intent Classification. ICASSP 2021: 7498-7502 - [c604]Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu:
The Multi-Speaker Multi-Style Voice Cloning Challenge 2021. ICASSP 2021: 8613-8617 - [c603]Huiping Zhuang, Zhenyu Weng, Fulin Luo, Kar-Ann Toj, Haizhou Li, Zhiping Lin:
Accumulated Decoupled Learning with Gradient Staleness Mitigation for Convolutional Neural Networks. ICML 2021: 12935-12944 - [c602]Jiadong Wang, Xinyuan Qian, Zihan Pan, Malu Zhang, Haizhou Li:
GCC-PHAT with Speech-oriented Attention for Robotic Sound Source Localization. ICRA 2021: 5876-5883 - [c601]Qu Yang, Jibin Wu, Haizhou Li:
Rethinking Benchmarks for Neuromorphic Learning Algorithms. IJCNN 2021: 1-8 - [c600]Hongning Zhu, Kong Aik Lee
, Haizhou Li:
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding. Interspeech 2021: 106-110 - [c599]Qiquan Zhang, Qi Song, Aaron Nicolson, Tian Lan, Haizhou Li:
Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement. Interspeech 2021: 166-170 - [c598]Xianghu Yue, Haizhou Li:
Phonetically Motivated Self-Supervised Speech Representation Learning. Interspeech 2021: 746-750 - [c597]Kun Zhou, Berrak Sisman, Haizhou Li:
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-Stage Sequence-to-Sequence Training. Interspeech 2021: 811-815 - [c596]Rohan Kumar Das, Maulik C. Madhavi, Haizhou Li:
Diagnosis of COVID-19 Using Auditory Acoustic Cues. Interspeech 2021: 921-925 - [c595]Li Zhang, Qing Wang, Kong Aik Lee
, Lei Xie, Haizhou Li:
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification. Interspeech 2021: 1094-1098 - [c594]Yi Zhou, Xiaohai Tian, Zhizheng Wu, Haizhou Li:
Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation. Interspeech 2021: 1374-1378 - [c593]Marvin Borsdorf
, Chenglin Xu, Haizhou Li, Tanja Schultz
:
Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers. Interspeech 2021: 1469-1473 - [c592]Wupeng Wang, Chenglin Xu, Meng Ge, Haizhou Li:
Neural Speaker Extraction with Speaker-Speech Cross-Attention Network. Interspeech 2021: 3535-3539 - [c591]Marvin Borsdorf
, Chenglin Xu, Haizhou Li, Tanja Schultz
:
GlobalPhone Mix-To-Separate Out of 2: A Multilingual 2000 Speakers Mixtures Database for Speech Separation. Interspeech 2021: 3905-3909 - [c590]Rui Liu, Berrak Sisman, Haizhou Li:
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability. Interspeech 2021: 4648-4652 - [c589]Yidi Jiang, Bidisha Sharma, Maulik C. Madhavi, Haizhou Li:
Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification. Interspeech 2021: 4713-4717 - [c588]Meidan Ouyang, Rohan Kumar Das, Jichen Yang, Haizhou Li:
Capsule Network based End-to-end System for Detection of Replay Attacks. ISCSLP 2021: 1-5 - [c587]Chen Zhang
, Luis Fernando D'Haro, Yiming Chen, Thomas Friedrichs, Haizhou Li:
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation. IWSDS 2021: 291-306 - [c586]Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li:
Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection. ACM Multimedia 2021: 3927-3935 - [c585]Xinyuan Qian, Bidisha Sharma, Amine El Abridi, Haizhou Li:
SLoClas: A Database for Joint Sound Localization and Classification. O-COCOSDA 2021: 128-133 - [c584]Haizhou Li, Gina-Anne Levow, Zhou Yu, Chitralekha Gupta, Berrak Sisman, Siqi Cai, David Vandyke, Nina Dethlefs, Yan Wu, Junyi Jessy Li:
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue. SIGDIAL 2021 - [c583]Kun Zhou, Berrak Sisman
, Haizhou Li:
Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech. SLT 2021: 415-422 - [c582]Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity. SLT 2021: 507-513 - [e20]Deyi Xiong, Ridong Jiang, Yanfeng Lu, Minghui Dong, Haizhou Li:
International Conference on Asian Language Processing, IALP 2021, Singapore, December 11-13, 2021. IEEE 2021, ISBN 978-1-6654-8311-7 [contents] - [e19]Erik Marchi, Sabato Marco Siniscalchi, Sandro Cumani, Valerio Mario Salerno, Haizhou Li:
Increasing Naturalness and Flexibility in Spoken Dialogue Interaction - 10th International Workshop on Spoken Dialogue Systems, IWSDS 2019, Syracuse, Sicily, Italy, 24-26 April 2019. Lecture Notes in Electrical Engineering 714, Springer 2021, ISBN 978-981-15-9322-2 [contents] - [e18]Haizhou Li, Gina-Anne Levow, Zhou Yu, Chitralekha Gupta, Berrak Sisman, Siqi Cai, David Vandyke, Nina Dethlefs, Yan Wu, Junyi Jessy Li:
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGdial 2021, Singapore and Online, July 29-31, 2021. Association for Computational Linguistics 2021, ISBN 978-1-954085-81-7 [contents] - [e17]Haizhou Li
, Shuzhi Sam Ge
, Yan Wu
, Agnieszka Wykowska
, Hongsheng He
, Xiaorui Liu
, Dongyu Li
, Jairo Pérez-Osorio
:
Social Robotics - 13th International Conference, ICSR 2021, Singapore, November 10-13, 2021, Proceedings. Lecture Notes in Computer Science 13086, Springer 2021, ISBN 978-3-030-90524-8 [contents] - [i88]Bidisha Sharma, Maulik C. Madhavi, Haizhou Li:
Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification. CoRR abs/2102.07370 (2021) - [i87]Siqi Cai, Pengcheng Sun, Tanja Schultz, Haizhou Li:
Low-latency auditory spatial attention detection based on spectro-spatial features from EEG. CoRR abs/2103.03621 (2021) - [i86]Chenglin Xu, Wei Rao, Jibin Wu, Haizhou Li:
Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech. CoRR abs/2103.16269 (2021) - [i85]Kun Zhou, Berrak Sisman, Haizhou Li:
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training. CoRR abs/2103.16809 (2021) - [i84]Rui Liu, Berrak Sisman, Haizhou Li:
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability. CoRR abs/2104.01408 (2021) - [i83]Xinyuan Qian, Maulik C. Madhavi, Zexu Pan, Jiadong Wang, Haizhou Li:
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism. CoRR abs/2105.06107 (2021) - [i82]Kun Zhou, Berrak Sisman, Rui Liu, Haizhou Li:
Emotional Voice Conversion: Theory, Databases and ESD. CoRR abs/2105.14762 (2021) - [i81]Chen Zhang, Yiming Chen, Luis Fernando D'Haro, Yan Zhang, Thomas Friedrichs, Grandee Lee, Haizhou Li:
DynaEval: Unifying Turn and Dialogue Level Evaluation. CoRR abs/2106.01112 (2021) - [i80]Li Zhang, Qing Wang, Kong Aik Lee, Lei Xie, Haizhou Li:
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification. CoRR abs/2106.09320 (2021) - [i79]Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li:
Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer. CoRR abs/2107.03748 (2021) - [i78]Hongning Zhu, Kong Aik Lee, Haizhou Li:
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding. CoRR abs/2107.06493 (2021) - [i77]Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li:
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection. CoRR abs/2107.06592 (2021) - [i76]Xinyuan Qian, Bidisha Sharma, Amine El Abridi, Haizhou Li:
SLoClas: A Database for Joint Sound Localization and Classification. CoRR abs/2108.02539 (2021) - [i75]Yidi Jiang, Bidisha Sharma, Maulik C. Madhavi, Haizhou Li:
Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification. CoRR abs/2108.02598 (2021) - [i74]Bidisha Sharma, Maulik C. Madhavi, Xuehao Zhou, Haizhou Li:
Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification. CoRR abs/2109.13486 (2021) - [i73]Zexu Pan, Meng Ge, Haizhou Li:
USEV: Universal Speaker Extraction with Visual Cue. CoRR abs/2109.14831 (2021) - [i72]Yi Ma, Kong Aik Lee, Ville Hautamäki, Haizhou Li:
PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction. CoRR abs/2110.00940 (2021) - [i71]Yiming Chen, Yan Zhang, Chen Zhang, Grandee Lee, Ran Cheng, Haizhou Li:
Revisiting Self-Training for Few-Shot Learning of Language Model. CoRR abs/2110.01256 (2021) - [i70]Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Thomas Friedrichs, Haizhou Li:
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation. CoRR abs/2110.01895 (2021) - [i69]Rui Liu, Berrak Sisman, Haizhou Li:
StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis. CoRR abs/2110.03156 (2021) - [i68]Junchen Lu, Berrak Sisman, Rui Liu, Mingyang Zhang, Haizhou Li:
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over. CoRR abs/2110.03342 (2021) - [i67]Sergey Nikonorov, Berrak Sisman, Mingyang Zhang, Haizhou Li:
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding. CoRR abs/2110.06434 (2021) - [i66]Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina González, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jáchym Kolár, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova
, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbeláez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem
, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard A. Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik:
Ego4D: Around the World in 3, 000 Hours of Egocentric Video. CoRR abs/2110.07058 (2021) - [i65]Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li:
Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity. CoRR abs/2110.10326 (2021) - [i64]Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li:
MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition. CoRR abs/2111.00865 (2021) - [i63]Qiquan Zhang, Qi Song, Zhaoheng Ni, Aaron Nicolson, Haizhou Li:
Time-Frequency Attention for Monaural Speech Enhancement. CoRR abs/2111.07518 (2021) - [i62]Chen Zhang, Luis Fernando D'Haro, Thomas Friedrichs, Haizhou Li:
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation. CoRR abs/2112.07194 (2021) - 2020
- [j113]Kong Aik Lee
, Seyed Omid Sadjadi, Haizhou Li
, Douglas A. Reynolds:
Two decades into Speaker Recognition Evaluation - are we there yet? Comput. Speech Lang. 61: 101058 (2020) - [j112]Malu Zhang, Jibin Wu, Ammar Belatreche, Zihan Pan, Xiurui Xie, Yansong Chua, Guoqi Li, Hong Qu
, Haizhou Li:
Supervised learning in spiking neural networks with synaptic delay-weight plasticity. Neurocomputing 409: 103-118 (2020) - [j111]Malu Zhang
, Xiaoling Luo, Yi Chen, Jibin Wu
, Ammar Belatreche
, Zihan Pan, Hong Qu
, Haizhou Li
:
An Efficient Threshold-Driven Aggregate-Label Learning Algorithm for Multimodal Information Processing. IEEE J. Sel. Top. Signal Process. 14(3): 592-602 (2020) - [j110]Mingyang Zhang
, Berrak Sisman, Li Zhao, Haizhou Li:
DeepConversion: Voice conversion with limited parallel training data. Speech Commun. 122: 31-43 (2020) - [j109]Yi Zhou
, Xiaohai Tian
, Haizhou Li
:
Multi-Task WaveRNN With an Integrated Architecture for Cross-Lingual Voice Conversion. IEEE Signal Process. Lett. 27: 1310-1314 (2020) - [j108]Rui Liu
, Berrak Sisman
, Feilong Bao, Guanglai Gao, Haizhou Li
:
Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS. IEEE Signal Process. Lett. 27: 1470-1474 (2020) - [j107]Chitralekha Gupta
, Haizhou Li
, Ye Wang
:
Automatic Leaderboard: Evaluation of Singing Quality Without a Standard Reference. IEEE ACM Trans. Audio Speech Lang. Process. 28: 13-26 (2020) - [j106]Chenglin Xu
, Wei Rao
, Eng Siong Chng
, Haizhou Li
:
SpEx: Multi-Scale Time Domain Speaker Extraction Network. IEEE ACM Trans. Audio Speech Lang. Process. 28: 1370-1384 (2020) - [j105]Jichen Yang
, Rohan Kumar Das
, Haizhou Li
:
Significance of Subband Features for Synthetic Speech Detection. IEEE Trans. Inf. Forensics Secur. 15: 2160-2170 (2020) - [c581]Grandee Lee, Haizhou Li:
Modeling Code-Switch Languages Using Bilingual Parallel Corpus. ACL 2020: 860-870 - [c580]Lin Huang, Chitralekha Gupta, Haizhou Li:
Spectral Features and Pitch Histogram for Automatic Singing Quality Evaluation with CRNN. APSIPA 2020: 492-499 - [c579]Zongyang Du, Kun Zhou, Berrak Sisman, Haizhou Li:
Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN. APSIPA 2020: 507-513 - [c578]Junchen Lu, Kun Zhou, Berrak Sisman, Haizhou Li:
VAW-GAN for Singing Voice Conversion with Non-parallel Training Data. APSIPA 2020: 514-519 - [c577]Rohan Kumar Das, Ruijie Tao, Jichen Yang, Wei Rao, Cheng Yu, Haizhou Li:
HLT-NUS Submission for 2019 NIST Multimedia Speaker Recognition Evaluation. APSIPA 2020: 605-609 - [c576]Rohan Kumar Das, Haizhou Li:
Classification of Speech with and without Face Mask using Acoustic Features. APSIPA 2020: 747-752 - [c575]Yi Zhou, Xiaohai Tian, Xuehao Zhou, Mingyang Zhang, Grandee Lee, Riu Liu, Berrak Sisman, Haizhou Li:
NUS-HLT System for Blizzard Challenge 2020. Blizzard Challenge / Voice Conversion Challenge 2020 - [c574]Xiaohai Tian, Zhichao Wang, Shan Yang, Xinyong Zhou, Hongqiang Du, Yi Zhou, Mingyang Zhang, Kun Zhou, Berrak Sisman, Lei Xie, Haizhou Li:
The NUS & NWPU system for Voice Conversion Challenge 2020. Blizzard Challenge / Voice Conversion Challenge 2020 - [c573]Wanqiu Lin, Maulik C. Madhavi, Rohan Kumar Das, Haizhou Li:
Transformer-based Arabic Dialect Identification. IALP 2020: 192-196 - [c572]Zhenyu Weng, Yuesheng Zhu, Zhiping Lin, Haizhou Li:
Real-Time Multiple Object Tracking with Discriminative Features. ICARCV 2020: 309-314 - [c571]Xinggan Peng, Huiping Zhuang
, Guang-Bin Huang
, Haizhou Li, Zhiping Lin:
Robust Real-time Face Tracking for People Wearing Face Masks. ICARCV 2020: 779-783 - [c570]Chitralekha Gupta, Emre Yilmaz, Haizhou Li:
Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help? ICASSP 2020: 496-500 - [c569]Xiang Hao, Chenglin Xu, Nana Hou, Lei Xie, Eng Siong Chng, Haizhou Li:
Time-Domain Neural Network Approach for Speech Bandwidth Extension. ICASSP 2020: 866-870 - [c568]Rui Liu
, Berrak Sisman
, Jingdong Li, Feilong Bao, Guanglai Gao, Haizhou Li:
Teacher-Student Training For Robust Tacotron-Based TTS. ICASSP 2020: 6274-6278 - [c567]Rohan Kumar Das
, Jichen Yang, Haizhou Li:
Assessing the Scope of Generalized Countermeasures for Anti-Spoofing. ICASSP 2020: 6589-6593 - [c566]Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li:
Independent Language Modeling Architecture for End-To-End ASR. ICASSP 2020: 7059-7063 - [c565]Rohan Kumar Das
, Haizhou Li:
On the Importance of Vocal Tract Constriction for Speaker Characterization: The Whispered Speech Study. ICASSP 2020: 7119-7123 - [c564]Xuehao Zhou, Xiaohai Tian, Grandee Lee, Rohan Kumar Das
, Haizhou Li:
End-to-End Code-Switching TTS with Cross-Lingual Language Model. ICASSP 2020: 7614-7618 - [c563]Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
Effective Wavenet Adaptation for Voice Conversion with Limited Data. ICASSP 2020: 7779-7783 - [c562]Zexu Pan, Zhaojie Luo
, Jichen Yang, Haizhou Li:
Multi-Modal Attention for Speech Emotion Recognition. INTERSPEECH 2020: 364-368 - [c561]Xinyuan Zhou, Emre Yilmaz, Yanhua Long, Yijie Li, Haizhou Li:
Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition. INTERSPEECH 2020: 1042-1046 - [c560]Zhenzong Wu, Rohan Kumar Das, Jichen Yang, Haizhou Li:
Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks. INTERSPEECH 2020: 1101-1105 - [c559]Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li:
SpEx+: A Complete Time Domain Speaker Extraction Network. INTERSPEECH 2020: 1406-1410 - [c558]Ruijie Tao, Rohan Kumar Das, Haizhou Li:
Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network. INTERSPEECH 2020: 2242-2246 - [c557]Emre Yilmaz, Özgür Bora Gevrek, Jibin Wu, Yuxiang Chen, Xuanbo Meng, Haizhou Li:
Deep Convolutional Spiking Neural Networks for Keyword Spotting. INTERSPEECH 2020: 2557-2561 - [c556]Siqi Cai, Enze Su, Yonghao Song, Longhan Xie, Haizhou Li:
Low Latency Auditory Attention Detection with Common Spatial Pattern Analysis of EEG Signals. INTERSPEECH 2020: 2772-2776 - [c555]Kun Zhou, Berrak Sisman, Mingyang Zhang, Haizhou Li:
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion. INTERSPEECH 2020: 3416-3420 - [c554]Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li:
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge. INTERSPEECH 2020: 3456-3460 - [c553]Nana Hou, Chenglin Xu, Van Tung Pham, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li:
Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network. INTERSPEECH 2020: 4064-4068 - [c552]Nana Hou, Chenglin Xu, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li:
Multi-Task Learning for End-to-End Noise-Robust Bandwidth Extension. INTERSPEECH 2020: 4069-4073 - [c551]Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li:
The Attacker's Perspective on Automatic Speaker Verification: An Overview. INTERSPEECH 2020: 4213-4217 - [c550]Tianchi Liu
, Rohan Kumar Das, Maulik C. Madhavi, Shengmei Shen, Haizhou Li:
Speaker-Utterance Dual Attention for Speaker and Utterance Verification. INTERSPEECH 2020: 4293-4297 - [c549]Xinyuan Zhou, Grandee Lee, Emre Yilmaz, Yanhua Long, Jiaen Liang, Haizhou Li:
Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR. INTERSPEECH 2020: 5016-5020 - [c548]Chitralekha Gupta, Lin Huang, Haizhou Li:
Automatic Rank-Ordering of Singing Vocals with Twin-Neural Network. ISMIR 2020: 416-423 - [c547]Chen Zhang
, Luis Fernando D'Haro, Rafael E. Banchs, Thomas Friedrichs, Haizhou Li:
Deep AM-FM: Toolkit for Automatic Dialogue Evaluation. IWSDS 2020: 53-69 - [c546]Xiaohai Tian, Rohan Kumar Das
, Haizhou Li:
Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion. Odyssey 2020: 159-164 - [c545]Kun Zhou, Berrak Sisman, Haizhou Li:
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data. Odyssey 2020: 230-237 - [c544]Berrak Sisman, Haizhou Li:
Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data. Odyssey 2020: 238-244 - [c543]Rui Liu
, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li:
WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss. Odyssey 2020: 245-251 - [c542]Xiaoxue Gao, Xiaohai Tian, Yi Zhou, Rohan Kumar Das
, Haizhou Li:
Personalized Singing Voice Generation Using WaveRNN. Odyssey 2020: 252-258 - [i61]Kun Zhou, Berrak Sisman, Haizhou Li:
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data. CoRR abs/2002.00198 (2020) - [i60]Xiaoyi Qin, Ming Li, Hui Bu, Rohan Kumar Das, Wei Rao, Shrikanth Narayanan, Haizhou Li:
The FFSVC 2020 Evaluation Plan. CoRR abs/2002.00387 (2020) - [i59]Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li:
WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss. CoRR abs/2002.00417 (2020) - [i58]Malu Zhang, Jiadong Wang, Zhixuan Zhang, Ammar Belatreche, Jibin Wu, Yansong Chua, Hong Qu, Haizhou Li:
Spike-Timing-Dependent Back Propagation in Deep Spiking Neural Networks. CoRR abs/2003.11837 (2020) - [i57]Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li:
SpEx: Multi-Scale Time Domain Speaker Extraction Network. CoRR abs/2004.08326 (2020) - [i56]Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li:
The Attacker's Perspective on Automatic Speaker Verification: An Overview. CoRR abs/2004.08849 (2020) - [i55]Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li:
Time-domain speaker extraction network. CoRR abs/2004.14762 (2020) - [i54]Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li:
SpEx+: A Complete Time Domain Speaker Extraction Network. CoRR abs/2005.04686 (2020) - [i53]Kun Zhou, Berrak Sisman, Mingyang Zhang, Haizhou Li:
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion. CoRR abs/2005.07025 (2020) - [i52]Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li:
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge. CoRR abs/2005.08046 (2020) - [i51]Srivatsa P, Kyle Timothy Ng Chu, Yaswanth Tavva, Jibin Wu, Malu Zhang, Haizhou Li, Trevor E. Carlson:
You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy. CoRR abs/2006.09982 (2020) - [i50]Xinyuan Zhou, Grandee Lee, Emre Yilmaz, Yanhua Long, Jiaen Liang, Haizhou Li:
Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR. CoRR abs/2006.10407 (2020) - [i49]Xinyuan Zhou, Emre Yilmaz, Yanhua Long, Yijie Li, Haizhou Li:
Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition. CoRR abs/2006.10414 (2020) - [i48]Jibin Wu, Chenglin Xu, Daquan Zhou, Haizhou Li, Kay Chen Tan:
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks. CoRR abs/2007.01204 (2020) - [i47]Zihan Pan, Malu Zhang, Jibin Wu, Haizhou Li:
Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by Spiking Neural Network. CoRR abs/2007.03274 (2020) - [i46]Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li:
Expressive TTS Training with Frame and Style Reconstruction Loss. CoRR abs/2008.01490 (2020) - [i45]Berrak Sisman, Junichi Yamagishi, Simon King, Haizhou Li:
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning. CoRR abs/2008.03648 (2020) - [i44]Junchen Lu, Kun Zhou, Berrak Sisman, Haizhou Li:
VAW-GAN for Singing Voice Conversion with Non-parallel Training Data. CoRR abs/2008.03992 (2020) - [i43]Zongyang Du, Kun Zhou, Berrak Sisman, Haizhou Li:
Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN. CoRR abs/2008.04562 (2020) - [i42]Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li:
Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS. CoRR abs/2008.05284 (2020) - [i41]Tianchi Liu, Rohan Kumar Das, Maulik C. Madhavi, Shengmei Shen, Haizhou Li:
Speaker-Utterance Dual Attention for Speaker and Utterance Verification. CoRR abs/2008.08901 (2020) - [i40]Zexu Pan, Zhaojie Luo, Jichen Yang, Haizhou Li:
Multi-modal Attention for Speech Emotion Recognition. CoRR abs/2009.04107 (2020) - [i39]Mingyang Zhang, Yi Zhou, Li Zhao, Haizhou Li:
Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data. CoRR abs/2009.14399 (2020) - [i38]Rohan Kumar Das, Ruijie Tao, Jichen Yang, Wei Rao, Cheng Yu, Haizhou Li:
HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation. CoRR abs/2010.03905 (2020) - [i37]Rohan Kumar Das, Haizhou Li:
Classification of Speech with and without Face Mask using Acoustic Features. CoRR abs/2010.03907 (2020) - [i36]Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li:
Muse: Multi-modal target speaker extraction with visual cues. CoRR abs/2010.07775 (2020) - [i35]Rui Liu, Berrak Sisman, Haizhou Li:
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis. CoRR abs/2010.12423 (2020) - [i34]Kun Zhou, Berrak Sisman, Rui Liu, Haizhou Li:
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset. CoRR abs/2010.14794 (2020) - [i33]Kun Zhou, Berrak Sisman, Haizhou Li:
VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech. CoRR abs/2011.02314 (2020) - [i32]Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
Optimizing voice conversion network with cycle consistency loss of speaker identity. CoRR abs/2011.08548 (2020) - [i31]Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li:
Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals. CoRR abs/2011.09624 (2020) - [i30]Bidisha Sharma, Xiaoxue Gao, Karthika Vijayan, Xiaohai Tian, Haizhou Li:
NHSS: A Speech and Singing Parallel Database. CoRR abs/2012.00337 (2020)
2010 – 2019
- 2019
- [j104]Luis Fernando D'Haro, Rafael E. Banchs, Chiori Hori
, Haizhou Li
:
Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics. Comput. Speech Lang. 55: 200-215 (2019) - [j103]Karthika Vijayan
, Haizhou Li
, Tomoki Toda
:
Speech-to-Singing Voice Conversion: The Challenges and Strategies for Improving Vocal Conversion Processes. IEEE Signal Process. Mag. 36(1): 95-102 (2019) - [j102]Berrak Sisman
, Mingyang Zhang, Haizhou Li
:
Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion. IEEE ACM Trans. Audio Speech Lang. Process. 27(6): 1085-1097 (2019) - [j101]Qiang Yu
, Haizhou Li
, Kay Chen Tan
:
Spike Timing or Rate? Neurons Learn to Make Decisions for Both Through Threshold-Driven Plasticity. IEEE Trans. Cybern. 49(6): 2178-2189 (2019) - [j100]Chong Zhang
, Kay Chen Tan
, Haizhou Li
, Geok Soon Hong:
A Cost-Sensitive Deep Belief Network for Imbalanced Classification. IEEE Trans. Neural Networks Learn. Syst. 30(1): 109-122 (2019) - [c541]Malu Zhang, Jibin Wu, Yansong Chua, Xiaoling Luo, Zihan Pan, Dan Liu, Haizhou Li:
MPD-AL: An Efficient Membrane Potential Driven Aggregate-Label Learning Algorithm for Spiking Neurons. AAAI 2019: 1327-1334 - [c540]Berrak Sisman, Karthika Vijayan, Minghui Dong, Haizhou Li:
SINGAN: Singing Voice Conversion with Generative Adversarial Networks. APSIPA 2019: 112-118 - [c539]Xiaoxue Gao, Xiaohai Tian, Rohan Kumar Das
, Yi Zhou, Haizhou Li:
Speaker-independent Spectral Mapping for Speech-to-Singing Conversion. APSIPA 2019: 159-164 - [c538]Nana Hou, Chenglin Xu, Eng Siong Chng, Haizhou Li:
Domain Adversarial Training for Speech Enhancement. APSIPA 2019: 667-672 - [c537]Yitong Liu, Rohan Kumar Das
, Haizhou Li:
Multi-band Spectral Entropy Information for Detection of Replay Attacks. APSIPA 2019: 838-843 - [c536]Karthika Vijayan, K. Sri Rama Murty
, Haizhou Li:
Allpass Modeling of Phase Spectrum of Speech Signals for Formant Tracking. APSIPA 2019: 1190-1196 - [c535]Yi Zhou, Xiaohai Tian, Rohan Kumar Das
, Haizhou Li:
Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network. APSIPA 2019: 1282-1287 - [c534]Rohan Kumar Das
, Jichen Yang, Haizhou Li:
Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech. APSIPA 2019: 1630-1635 - [c533]Berrak Sisman
, Mingyang Zhang, Minghui Dong, Haizhou Li:
On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion. ASRU 2019: 144-151 - [c532]Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
WaveNet Factorization with Singular Value Decomposition for Voice Conversion. ASRU 2019: 152-159 - [c531]Yi Zhou, Xiaohai Tian, Emre Yilmaz, Rohan Kumar Das
, Haizhou Li:
A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion. ASRU 2019: 160-167 - [c530]Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li:
Time-Domain Speaker Extraction Network. ASRU 2019: 327-334 - [c529]Xianghu Yue, Grandee Lee, Emre Yilmaz, Fang Deng, Haizhou Li:
End-to-End Code-Switching ASR for Low-Resourced Language Pairs. ASRU 2019: 972-979 - [c528]Rohan Kumar Das
, Jichen Yang, Haizhou Li:
Long Range Acoustic and Deep Features Perspective on ASVspoof 2019. ASRU 2019: 1018-1025 - [c527]Bidisha Sharma, Chitralekha Gupta
, Haizhou Li
, Ye Wang
:
Automatic Lyrics-to-audio Alignment on Polyphonic Music Using Singing-adapted Acoustic Models. ICASSP 2019: 396-400 - [c526]Buddhi Wickramasinghe, Eliathamby Ambikairajah
, Julien Epps, Vidhyasaharan Sethu
, Haizhou Li
:
Auditory Inspired Spatial Differentiation for Replay Spoofing Attack Detection. ICASSP 2019: 6011-6015 - [c525]Grandee Lee, Haizhou Li
:
Word and Class Common Space Embedding for Code-switch Language Modelling. ICASSP 2019: 6086-6090 - [c524]Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das
, Haizhou Li
:
Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling. ICASSP 2019: 6790-6794 - [c523]Chenglin Xu, Wei Rao, Eng Siong Chng
, Haizhou Li
:
Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss. ICASSP 2019: 6990-6994 - [c522]Zihan Pan, Jibin Wu, Malu Zhang, Haizhou Li
, Yansong Chua:
Neural Population Coding for Effective Temporal Classification. IJCNN 2019: 1-8 - [c521]Jibin Wu, Yansong Chua, Malu Zhang, Qu Yang, Guoqi Li, Haizhou Li
:
Deep Spiking Neural Network with Spike Count based Learning Rule. IJCNN 2019: 1-6 - [c520]Jibin Wu, Malu Zhang, Haizhou Li
, Yansong Chua:
Competitive STDP-based Feature Representation Learning for Sound Event Classification. IJCNN 2019: 1-8 - [c519]Xiaohai Tian, Eng Siong Chng
, Haizhou Li
:
A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data. INTERSPEECH 2019: 201-205 - [c518]Emre Yilmaz, Adem Derinel, Kun Zhou, Henk van den Heuvel, Niko Brummer, Haizhou Li
, David A. van Leeuwen:
Large-Scale Speaker Diarization of Radio Broadcast Archives. INTERSPEECH 2019: 411-415 - [c517]Bidisha Sharma, Haizhou Li
:
A Combination of Model-Based and Feature-Based Strategy for Speech-to-Singing Alignment. INTERSPEECH 2019: 624-628 - [c516]Rohan Kumar Das
, Jichen Yang, Haizhou Li
:
Long Range Acoustic Features for Spoofed Speech Detection. INTERSPEECH 2019: 1058-1062 - [c515]Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li
, Satoshi Nakamura:
VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019. INTERSPEECH 2019: 1118-1122 - [c514]Wei Rao, Chenglin Xu, Eng Siong Chng
, Haizhou Li
:
Target Speaker Extraction for Multi-Talker Speaker Verification. INTERSPEECH 2019: 1273-1277 - [c513]Mingyang Zhang, Xin Wang
, Fuming Fang, Haizhou Li
, Junichi Yamagishi:
Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet. INTERSPEECH 2019: 1298-1302 - [c512]Kong Aik Lee
, Ville Hautamäki
, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das
, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco:
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences. INTERSPEECH 2019: 1497-1501 - [c511]Bidisha Sharma, Rohan Kumar Das
, Haizhou Li
:
Multi-Level Adaptive Speech Activity Detector for Speech in Naturalistic Environments. INTERSPEECH 2019: 2015-2019 - [c510]Bidisha Sharma, Rohan Kumar Das
, Haizhou Li
:
On the Importance of Audio-Source Separation for Singer Identification in Polyphonic Music. INTERSPEECH 2019: 2020-2024 - [c509]Chitralekha Gupta
, Emre Yilmaz, Haizhou Li
:
Acoustic Modeling for Automatic Lyrics-to-Audio Alignment. INTERSPEECH 2019: 2040-2044 - [c508]Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Eng Siong Chng
, Haizhou Li
:
On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition. INTERSPEECH 2019: 2165-2169 - [c507]Chitralekha Gupta, Karthika Vijayan, Bidisha Sharma, Xiaoxue Gao, Haizhou Li:
NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion. INTERSPEECH 2019: 2376-2377 - [c506]Rohan Kumar Das
, Haizhou Li
:
Instantaneous Phase and Long-Term Acoustic Cues for Orca Activity Detection. INTERSPEECH 2019: 2418-2422 - [c505]Tharshini Gunendradasan, Eliathamby Ambikairajah
, Julien Epps, Haizhou Li
:
An Adaptive-Q Cochlear Model for Replay Spoofing Detection. INTERSPEECH 2019: 2918-2922 - [c504]Jibin Wu, Zihan Pan, Malu Zhang, Rohan Kumar Das, Yansong Chua, Haizhou Li:
Robust Sound Recognition: A Neuromorphic Approach. INTERSPEECH 2019: 3667-3668 - [c503]Grandee Lee, Xianghu Yue, Haizhou Li
:
Linguistically Motivated Parallel Data Augmentation for Code-Switch Language Modeling. INTERSPEECH 2019: 3730-3734 - [c502]Qinyi Wang
, Emre Yilmaz, Adem Derinel, Haizhou Li
:
Code-Switching Detection Using ASR-Generated Language Posteriors. INTERSPEECH 2019: 3740-3744 - [c501]Emre Yilmaz, Samuel Cohen, Xianghu Yue, David A. van Leeuwen, Haizhou Li
:
Multi-Graph Decoding for Code-Switching ASR. INTERSPEECH 2019: 3750-3754 - [c500]Tianchi Liu
, Maulik C. Madhavi
, Rohan Kumar Das
, Haizhou Li
:
A Unified Framework for Speaker and Utterance Verification. INTERSPEECH 2019: 4320-4324 - [c499]Maulik C. Madhavi, Tong Zhan, Haizhou Li, Min Yuan:
First Leap Towards Development of Dialogue System for Autonomous Bus. IWSDS 2019: 393-400 - [c498]Haizhou Li:
Country Report - Singapore. O-COCOSDA 2019: 1-6 - [c497]Rohan Sheelvant, Bidisha Sharma, Maulik C. Madhavi, Rohan Kumar Das
, S. R. M. Prasanna, Haizhou Li:
RSL2019: A Realistic Speech Localization Corpus. O-COCOSDA 2019: 1-6 - [e16]Luis Fernando D'Haro, Rafael E. Banchs, Haizhou Li:
9th International Workshop on Spoken Dialogue System Technology, IWSDS 2018, Singapore, April 18-20, 2018. Lecture Notes in Electrical Engineering 579, Springer 2019, ISBN 978-981-13-9442-3 [contents] - [i29]Wei Rao, Chenglin Xu, Eng Siong Chng, Haizhou Li:
Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification. CoRR abs/1902.02546 (2019) - [i28]Xiaohai Tian, Eng Siong Chng, Haizhou Li:
A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data. CoRR abs/1902.03705 (2019) - [i27]Jibin Wu, Yansong Chua, Malu Zhang, Qu Yang, Guoqi Li, Haizhou Li:
Deep Spiking Neural Network with Spike Count based Learning Rule. CoRR abs/1902.05705 (2019) - [i26]Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li:
Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss. CoRR abs/1903.09952 (2019) - [i25]Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi:
Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet. CoRR abs/1903.12389 (2019) - [i24]Kong Aik Lee, Ville Hautamäki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md. Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Tran Huy Dat, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-François Bonastre, Chenglin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas W. D. Evans:
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences. CoRR abs/1904.07386 (2019) - [i23]Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura:
VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019. CoRR abs/1905.11449 (2019) - [i22]Emre Yilmaz, Samuel Cohen, Xianghu Yue, David A. van Leeuwen, Haizhou Li:
Multi-Graph Decoding for Code-Switching ASR. CoRR abs/1906.07523 (2019) - [i21]Emre Yilmaz, Adem Derinel, Kun Zhou, Henk van den Heuvel, Niko Brummer, Haizhou Li, David A. van Leeuwen:
Large-Scale Speaker Diarization of Radio Broadcast Archives. CoRR abs/1906.07955 (2019) - [i20]Qinyi Wang, Emre Yilmaz, Adem Derinel, Haizhou Li:
Code-Switching Detection Using ASR-Generated Language Posteriors. CoRR abs/1906.08003 (2019) - [i19]Chitralekha Gupta, Emre Yilmaz, Haizhou Li:
Acoustic Modeling for Automatic Lyrics-to-Audio Alignment. CoRR abs/1906.10369 (2019) - [i18]Jibin Wu, Yansong Chua, Malu Zhang, Guoqi Li, Haizhou Li, Kay Chen Tan:
A Hybrid Learning Rule for Efficient and Rapid Inference with Spiking Neural Networks. CoRR abs/1907.01167 (2019) - [i17]Zihan Pan, Yansong Chua, Jibin Wu, Malu Zhang, Haizhou Li, Eliathamby Ambikairajah:
An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks. CoRR abs/1909.01302 (2019) - [i16]Zihan Pan, Jibin Wu, Yansong Chua, Malu Zhang, Haizhou Li:
Neural Population Coding for Effective Temporal Classification. CoRR abs/1909.08018 (2019) - [i15]Chitralekha Gupta, Emre Yilmaz, Haizhou Li:
Automatic Lyrics Transcription in Polyphonic Music: Does Background Music Help? CoRR abs/1909.10200 (2019) - [i14]Xianghu Yue, Grandee Lee, Emre Yilmaz, Fang Deng, Haizhou Li:
End-to-End Code-Switching ASR for Low-Resourced Language Pairs. CoRR abs/1909.12681 (2019) - [i13]Rui Liu, Berrak Sisman, Jingdong Li, Feilong Bao, Guanglai Gao, Haizhou Li:
Teacher-Student Training for Robust Tacotron-based TTS. CoRR abs/1911.02839 (2019) - [i12]Jibin Wu, Emre Yilmaz, Malu Zhang, Haizhou Li, Kay Chen Tan:
Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition. CoRR abs/1911.08373 (2019) - [i11]Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li:
Independent language modeling architecture for end-to-end ASR. CoRR abs/1912.00863 (2019) - 2018
- [j99]Saad Irtza, Vidhyasaharan Sethu
, Eliathamby Ambikairajah
, Haizhou Li
:
Using language cluster models in hierarchical language identification. Speech Commun. 100: 30-40 (2018) - [j98]Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen
, Eng Siong Chng
, Haizhou Li
:
Re-ranking spoken term detection with acoustic exemplars of keywords. Speech Commun. 104: 12-23 (2018) - [j97]Longting Xu
, Kong-Aik Lee
, Haizhou Li
, Zhen Yang:
Generalizing I-Vector Estimation for Rapid Speaker Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 26(4): 749-759 (2018) - [j96]Haizhou Li
:
Farewell Editorial. IEEE ACM Trans. Audio Speech Lang. Process. 26(12): 2489 (2018) - [c496]Zhongwei Li, Xuancong Wang, AiTi Aw, Eng Siong Chng, Haizhou Li:
Named-Entity Tagging and Domain adaptation for Better Customized Translation. NEWS@ACL 2018: 41-46 - [c495]Nancy F. Chen, Xiangyu Duan, Min Zhang, Rafael E. Banchs, Haizhou Li:
NEWS 2018 Whitepaper. NEWS@ACL 2018: 47-54 - [c494]Nancy F. Chen, Rafael E. Banchs, Min Zhang, Xiangyu Duan, Haizhou Li:
Report of NEWS 2018 Named Entity Transliteration Shared Task. NEWS@ACL 2018: 55-73 - [c493]Mingyang Zhang, Berrak Sisman, Sai Sirisha Rallabandi, Haizhou Li
, Li Zhao:
Error Reduction Network for DBLSTM-based Voice Conversion. APSIPA 2018: 823-828 - [c492]