default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 32
Volume 32, 2024
- Jin Chu Wu, Raghu N. Kacker:
Statistical Analysis for Speaker Recognition Evaluation With Data Dependence and Three Score Distributions. 1-14 - Yongwei Zhou, Junwei Bao, Youzheng Wu, Xiaodong He, Tiejun Zhao:
Operation-Augmented Numerical Reasoning for Question Answering. 15-28 - Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram Ganapathy:
Speech Dereverberation With Frequency Domain Autoregressive Modeling. 29-38 - Leyuan Qu, Taihao Li, Cornelius Weber, Theresa Pekarek-Rosin, Fuji Ren, Stefan Wermter:
Disentangling Prosody Representations With Unsupervised Speech Reconstruction. 39-54 - Mathias Bach Pedersen, Søren Holdt Jensen, Zheng-Hua Tan, Jesper Jensen:
Data-Driven Non-Intrusive Speech Intelligibility Prediction Using Speech Presence Probability. 55-67 - Yuanbo Hou, Bo Kang, Andrew Mitchell, Wenwu Wang, Jian Kang, Dick Botteldooren:
Cooperative Scene-Event Modelling for Acoustic Scene Classification. 68-82 - Xiaotong Jiang, Peiwen You, Chen Chen, Zhongqing Wang, Guodong Zhou:
Exploring Scope Detection for Aspect-Based Sentiment Analysis. 83-94 - Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu:
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning. 95-112 - Federico Miotello, Mirco Pezzoli, Luca Comanducci, Fabio Antonacci, Augusto Sarti:
Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks. 113-123 - Cristian Lucian Stanciu, Jacob Benesty, Constantin Paleologu, Ruxandra-Liana Costea, Laura-Maria Dogariu, Silviu Ciochina:
Decomposition-Based Wiener Filter Using the Kronecker Product and Conjugate Gradient Method. 124-138 - Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang:
Automatic Noise Generation and Reduction for Text Classification. 139-150 - Jiaming Xu, Jian Cui, Yunzhe Hao, Bo Xu:
Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments. 151-163 - Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen:
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training. 164-177 - Xiao Li, Ruirui Liu, Huichou Huang, Qingyao Wu:
Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion. 178-188 - Xiaobo Liang, Runze Mao, Lijun Wu, Juntao Li, Min Zhang, Qing Li:
Enhancing Low-Resource NLP by Consistency Training With Data and Model Perturbations. 189-199 - Haisheng Lu, Jiangnan Liang, Chuang Shi:
Comments on "Primary-Ambient Extraction Using Ambient Spectrum Estimation for Immersive Spatial Audio Reproduction". 200-202 - Szymon Drgas, Lars Bramsløw, Archontis Politis, Gaurav Naithani, Tuomas Virtanen:
Dynamic Processing Neural Network Architecture for Hearing Loss Compensation. 203-214 - Femke B. Gelderblom, Tron V. Tronstad, Torbjørn Svendsen, Tor André Myrvoll:
On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks. 215-226 - Thomas Haubner, Andreas Brendel, Walter Kellermann:
End-to-End Deep Learning-Based Adaptation Control for Linear Acoustic Echo Cancellation. 227-238 - Congcong Jiang, Tieyun Qian, Bing Liu:
One General Teacher for Multi-Data Multi-Task: A New Knowledge Distillation Framework for Discourse Relation Analysis. 239-249 - Khandokar Md. Nayem, Donald S. Williamson:
Attention-Based Speech Enhancement Using Human Quality Perception Modeling. 250-260 - Ying Zhang, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou:
Complex Question Enhanced Transfer Learning for Zero-Shot Joint Information Extraction. 261-275 - Jingsong Yan, Piji Li, Haibin Chen, Junhao Zheng, Qianli Ma:
Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text Classification. 276-285 - Georgios Paraskevopoulos, Theodoros Kouzelis, Georgios Rouvalis, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos:
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems: A Case Study for Modern Greek. 286-299 - Ernesto Accolti, Javier Gimenez, Michael Vorländer:
Uncertainties of Room Acoustics Simulation Due to Directivity Data of Musical Instruments. 300-309 - Yoshiki Masuyama, Kouei Yamaoka, Yuma Kinoshita, Taishi Nakashima, Nobutaka Ono:
Causal and Relaxed-Distortionless Response Beamforming for Online Target Source Extraction. 310-324 - Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. 325-351 - Yun Zhao, Dexi Liu, Changxuan Wan, Xiping Liu, Jian-Yun Nie, Jiaming Liu:
JMS-QA: A Joint Hierarchical Architecture for Mental Health Question Answering. 352-363 - Shiwen Ni, Jiawen Li, Min Yang, Hung-Yu Kao:
DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding. 364-373 - Tiantian Zhu, Yang Qin, Ming Feng, Qingcai Chen, Baotian Hu, Yang Xiang:
BioPRO: Context-Infused Prompt Learning for Biomedical Entity Linking. 374-385 - Jiapu Wang, Boyue Wang, Junbin Gao, Simin Hu, Yongli Hu, Baocai Yin:
Multi-Level Interaction Based Knowledge Graph Completion. 386-396 - Qiangqiang Zhang, Dongyuan Lin, Yingying Xiao, Yunfei Zheng, Shiyuan Wang:
Error Reused Filtered-X Least Mean Square Algorithm for Active Noise Control. 397-412 - Zengrui Jin, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu:
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition. 413-429 - Jun Kong, Jin Wang, Xuejie Zhang:
Adaptive Ensemble Self-Distillation With Consistent Gradients for Fast Inference of Pretrained Language Models. 430-442 - Srdan Kitic, Jérôme Daniel:
Blind Identification of Ambisonic Reduced Room Impulse Response. 443-458 - Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu, Lei Xie:
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition. 459-470 - Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan:
Boosting Cross-Domain Speech Recognition With Self-Supervision. 471-485 - Yile Wang, Yue Zhang, Peng Li, Yang Liu:
Gradual Syntactic Label Replacement for Language Model Pre-Training. 486-496 - Penghui Ma, Jianfeng Li, Jingjing Pan, Xiaofei Zhang, Roberto Gil-Pita:
Coherent Signal DOA Estimation With Coprime Array: Exploiting Signal Subspace Reconstructing Strategy. 497-508 - Emma Hamel, Nickvash Kani:
Factors That Influence Automatic Recognition of African-American Vernacular English in Machine-Learning Models. 509-516 - Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing. 517-528 - Bing Han, Zhengyang Chen, Yanmin Qian:
Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification. 529-541 - Kristina Tesch, Timo Gerkmann:
Multi-Channel Speech Separation Using Spatially Selective Deep Non-Linear Filters. 542-553 - Hao-Chen Pei, Hao Fang, Xin Luo, Xin-Shun Xu:
Gradformer: A Framework for Multi-Aspect Multi-Granularity Pronunciation Assessment. 554-563 - Garima Sharma, Karthikeyan Umapathy, Sridhar Krishnan:
Time-Frequency Scattergrams for Biomedical Audio Signal Representation and Classification. 564-576 - Zhibo Man, Zengcheng Huang, Yujie Zhang, Yu Li, Yuanmeng Chen, Yufeng Chen, Jinan Xu:
WDSRL: Multi-Domain Neural Machine Translation With Word-Level Domain-Sensitive Representation Learning. 577-590 - Chin-Po Chen, Ho-Hsien Pan, Susan Shur-Fen Gau, Chi-Chun Lee:
Using Measures of Vowel Space for Autistic Traits Characterization. 591-607 - Kevin Wilkinghoff, Frank Kurth:
Why Do Angular Margin Losses Work Well for Semi-Supervised Anomalous Sound Detection? 608-622 - Aku Rouhe, Tamás Grósz, Mikko Kurimo:
Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the 1000-Hour Scale. 623-638 - Yile Wang, Yue Zhang:
Lost in Context? On the Sense-Wise Variance of Contextualized Word Embeddings. 639-650 - Christoph Hold, Ville Pulkki, Archontis Politis, Leo McCormack:
Compression of Higher-Order Ambisonic Signals Using Directional Audio Coding. 651-665 - Shouhui Wang, Biao Qin:
A Novel Joint Training Model for Knowledge Base Question Answering. 666-679 - Songbin Li, Jingang Wang, Peng Liu, Ke Shi:
SANet: A Compressed Speech Encoder and Steganography Algorithm Independent Steganalysis Deep Neural Network. 680-690 - Tarek Kanan, Amani AbedAlghafer, Shadi AlZu'bi, Bilal Hawashin, Ala Mughaid, Ghassan Kanaan, M. M. Kamruzzaman:
An Intelligent Health Care System for Detecting Drug Abuse in Social Media Platforms Based on Low Resource Language. 691-703 - Alejandro Santorum Varela, Svetlana Stoyanchev, Simon Keizer, Rama Doddipatla, Kate M. Knill:
Entity Resolution in Situated Dialog With Unimodal and Multimodal Transformers. 704-713 - Huang He, Hua Lu, Siqi Bao, Fan Wang, Hua Wu, Zheng-Yu Niu, Haifeng Wang:
Learning to Select External Knowledge With Multi-Scale Negative Sampling. 714-720 - Hua Lu, Zhen Guo, Chanjuan Li, Yunyi Yang, Huang He, Siqi Bao:
Towards Building an Open-Domain Dialogue System Incorporated With Internet Memes. 721-726 - Jungwoo Lim, Taesun Whang, Dongyub Lee, Heuiseok Lim:
Adaptive Multi-Domain Dialogue State Tracking on Spoken Conversations. 727-732 - David Thulke, Nico Daheim, Christian Dugast, Hermann Ney:
Task-Oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10. 733-741 - Han Wu, Kun Xu, Linqi Song:
Structure-Aware Dialogue Modeling Methods for Conversational Semantic Role Labeling. 742-752 - Zhe Chen, Hongcheng Liu, Yu Wang:
DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog. 753-764 - Koichiro Yoshino, Yun-Nung Chen, Paul A. Crook, Satwik Kottur, Jinchao Li, Behnam Hedayatnia, Seungwhan Moon, Zhengcong Fei, Zekang Li, Jinchao Zhang, Yang Feng, Jie Zhou, Seokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan, Dilek Hakkani-Tur, Babak Damavandi, Alborz Geramifard, Chiori Hori, Ankit Shah, Chen Zhang, Haizhou Li, João Sedoc, Luis F. D'Haro, Rafael E. Banchs, Alexander Rudnicky:
Overview of the Tenth Dialog System Technology Challenge: DSTC10. 765-778 - Shekhar Kumar Yadav, Nithin V. George:
Joint Dereverberation and Beamforming With Blind Estimation of the Shape Parameter of the Desired Source Prior. 779-793 - Yanxiong Li, Zhongjie Jiang, Qisheng Huang, Wenchang Cao, Jialong Li:
Lightweight Speaker Verification Using Transformation Module With Feature Partition and Fusion. 794-806 - Yuhan Dai, Zhirui Zhang, Yichao Du, Shengcai Liu, Lemao Liu, Tong Xu:
Datastore Distillation for Nearest Neighbor Machine Translation. 807-817 - Changtao Li, Feiran Yang, Jun Yang:
A Two-Stage Approach to Quality Restoration of Bone-Conducted Speech. 818-829 - Jie Zhou, Yuanbiao Lin, Qin Chen, Qi Zhang, Xuanjing Huang, Liang He:
CausalABSC: Causal Inference for Aspect Debiasing in Aspect-Based Sentiment Classification. 830-840 - Ruiying Lu, Bo Chen, Dandan Guo, Dongsheng Wang, Mingyuan Zhou:
Hierarchical Topic-Aware Contextualized Transformers. 841-852 - Yaru Zhao, Bo Cheng, Yakun Huang, Zhiguo Wan:
FluGCF: A Fluent Dialogue Generation Model With Coherent Concept Entity Flow. 853-867 - Changhao Ding, Zhangjie Fu, Zhongliang Yang, Qi Yu, Daqiu Li, Yongfeng Huang:
Context-Aware Linguistic Steganography Model Based on Neural Machine Translation. 868-878 - Zainab Alhakeem, Se-In Jang, Hong-Goo Kang:
Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification. 879-890 - Jae-Hong Lee, Joon-Hyuk Chang:
Partitioning Attention Weight: Mitigating Adverse Effect of Incorrect Pseudo-Labels for Self-Supervised ASR. 891-905 - Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura:
Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation. 906-916 - Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso:
Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech. 917-929 - Alexander Bohlender, Ann Spriet, Wouter Tirry, Nilesh Madhu:
Spatially Selective Speaker Separation Using a DNN With a Location Dependent Feature Extraction. 930-945 - Matan Karo, Arie Yeredor, Itshak Lapidot:
Compact Time-Domain Representation for Logical Access Spoofed Audio. 946-958 - Or Berebi, Zamir Ben-Hur, David Lou Alon, Boaz Rafaely:
Analysis and Design of Head-Tracked Compensation for Bilateral Ambisonics. 959-972 - Wei Wang, Yanmin Qian:
Universal Cross-Lingual Data Generation for Low Resource ASR. 973-983 - Davide Berghi, Philip J. B. Jackson:
Leveraging Visual Supervision for Array-Based Active Speaker Detection and Localization. 984-995 - Daniel Aleksander Krause, Guillermo García-Barrios, Archontis Politis, Annamaria Mesaros:
Binaural Sound Source Distance Estimation and Localization for a Moving Listener. 996-1011 - Seung-Bin Kim, Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee:
Audio Super-Resolution With Robust Speech Representation Learning of Masked Autoencoder. 1012-1022 - Omer Musa Battal, Aykut Koç:
Automatic Construction of Sememe Knowledge Bases From Machine Readable Dictionaries. 1023-1035 - Varun Krishna, Tarun Sai, Sriram Ganapathy:
Representation Learning With Hidden Unit Clustering for Low Resource Speech Applications. 1036-1047 - Zhengding Luo, Dongyuan Shi, Woon-Seng Gan, Qirui Huang:
Delayless Generative Fixed-Filter Active Noise Control Based on Deep Learning and Bayesian Filter. 1048-1060 - Zewen Chi, Heyan Huang, Luyang Liu, Yu Bai, Xiaoyan Gao, Xian-Ling Mao:
Can Pretrained English Language Models Benefit Non-English NLP Systems in Low-Resource Scenarios? 1061-1074 - Rui Liu, Yifan Hu, Haolin Zuo, Zhaojie Luo, Longbiao Wang, Guanglai Gao:
Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training. 1075-1087 - Shu Jiang, Zuchao Li, Hai Zhao, Weiping Ding:
Entity-Relation Extraction as Full Shallow Semantic Dependency Parsing. 1088-1099 - Yoav Vered, Stephen J. Elliott:
A Parallel Analog and Digital Adaptive Feedforward Controller for Active Noise Control. 1100-1108 - Puning Zhang, Rongjian Zhao, Boran Yang, Yuexian Li, Zhigang Yang:
Integrated Syntactic and Semantic Tree for Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network. 1109-1124 - Xu Wang, Hainan Zhang, Shuai Zhao, Hongshen Chen, Zhuoye Ding, Zhiguo Wan, Bo Cheng, Yanyan Lan:
Debiasing Counterfactual Context With Causal Inference for Multi-Turn Dialogue Reasoning. 1125-1132 - Hoang Ngoc Chau, Tien Dat Bui, Huu Binh Nguyen, Thanh Thi Hien Duong, Quoc-Cuong Nguyen:
A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks. 1133-1144 - Yuchen Hu, Chen Chen, Qiushi Zhu, Eng Siong Chng:
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR. 1145-1156 - Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino:
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction. 1157-1172 - Vibhav Agarwal, Sourav Ghosh, Harichandana B. S. S, Himanshu Arora, Barath Raj Kandur Raja:
TrICy: Trigger-Guided Data-to-Text Generation With Intent Aware Attention-Copy. 1173-1184 - Christoph Böddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux:
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings. 1185-1197 - Reza Varzandeh, Simon Doclo, Volker Hohmann:
Speech-Aware Binaural DOA Estimation Utilizing Periodicity and Spatial Features in Convolutional Neural Networks. 1198-1213 - Yigitcan Özer, Meinard Müller:
Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques. 1214-1225 - Lior Frenkel, Shlomo E. Chazan, Jacob Goldberger:
Domain Adaptation Using Suitable Pseudo Labels for Speech Enhancement and Dereverberation. 1226-1236 - Jiahao Zhao, Wenji Mao, Daniel Dajun Zeng:
Disentangled Text Representation Learning With Information-Theoretic Perspective for Adversarial Robustness. 1237-1247 - Dong Zhou, Fang Lei, Lin Li, Yongmei Zhou, Aimin Yang:
Cross-Modal Interaction via Reinforcement Feedback for Audio-Lyrics Retrieval. 1248-1260 - Xuechen Liu, Md. Sahidullah, Kong Aik Lee, Tomi Kinnunen:
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space. 1261-1273 - Shiyao Cui, Jiangxia Cao, Xin Cong, Jiawei Sheng, Quangang Li, Tingwen Liu, Jinqiao Shi:
Enhancing Multimodal Entity and Relation Extraction With Variational Information Bottleneck. 1274-1285 - Yizhou Tan, Haojun Ai, Shengchen Li, Mark D. Plumbley:
Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement. 1286-1297 - Orel Ben Zaken, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely:
Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information. 1298-1309 - Changsheng Quan, Xiaofei Li:
SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation. 1310-1323 - Matthew Baas, Herman Kamper:
Disentanglement in a GAN for Unconditional Speech Synthesis. 1324-1335 - Xian Li, Nian Shao, Xiaofei Li:
Self-Supervised Audio Teacher-Student Transformer for Both Clip-Level and Frame-Level Tasks. 1336-1351 - Yifan Chen, Gaofeng Cheng, Runyan Yang, Pengyuan Zhang, Yonghong Yan:
Interrelate Training and Clustering for Online Speaker Diarization. 1352-1364 - Sheng Feng, Xiaoqian Zhu, Shuqing Ma:
Masking Hierarchical Tokens for Underwater Acoustic Target Recognition With Self-Supervised Learning. 1365-1379