default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 31
Volume 31, 2023
- Mrinmoy Bhattacharjee, S. R. M. Prasanna, Prithwijit Guha:
Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning. 1-10 - Zhaojie Luo, Shoufeng Lin, Rui Liu, Jun Baba, Yuichiro Yoshikawa, Hiroshi Ishiguro:
Decoupling Speaker-Independent Emotions for Voice Conversion via Source-Filter Networks. 11-24 - Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu:
Integrating Lattice-Free MMI Into End-to-End Speech Recognition. 25-38 - Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman:
A Diffeomorphic Flow-Based Variational Framework for Multi-Speaker Emotion Conversion. 39-53 - Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao:
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features. 54-70 - Xiaoyi Qin, Danwei Cai, Ming Li:
Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios. 71-85 - Vikram C. Mathad, Julie M. Liss, Kathy Chapman, Nancy Scherer, Visar Berisha:
Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation. 86-95 - Li Li, Hirokazu Kameoka, Shoji Makino:
FastMVAE2: On Improving and Accelerating the Fast Variational Autoencoder-Based Source Separation Algorithm for Determined Mixtures. 96-110 - Jie Wang, Yan Yang, Keyu Liu, Zhiping Zhu, Xiaorong Liu:
M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER. 111-120 - Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi, Shoko Araki:
SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning. 121-136 - Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino:
BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations. 137-151 - Yingrui Xu, Hao Liu, Jingguo Ge, Xiaodan Zhang, Jingyuan Hu, Yulei Wu, Honglei Lv, Hongbin Shi, Wei Zhou:
Mining Weak Relations Between Reviews for Opinion Spam Detection. 152-162 - Yoshiki Masuyama, Kohei Yatabe, Kento Nagatomo, Yasuhiro Oikawa:
Online Phase Reconstruction via DNN-Based Phase Differences Estimation. 163-176 - Jiang Liu, Donghong Ji, Jingye Li, Dongdong Xie, Chong Teng, Liang Zhao, Fei Li:
TOE: A Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag/Word Relations and More Fine-Grained Tags. 177-187 - Zhe Hu, Zhiwei Cao, Hou Pong Chan, Jiachen Liu, Xinyan Xiao, Jinsong Su, Hua Wu:
Controllable Dialogue Generation With Disentangled Multi-Grained Style Specification and Attribute Consistency Reward. 188-199 - Sondes Abderrazek, Corinne Fredouille, Alain Ghio, Muriel Lalain, Christine Meunier, Virginie Woisard:
Interpreting Deep Representations of Phonetic Features via Neuro-Based Concept Detector: Application to Speech Disorders Due to Head and Neck Cancer. 200-214 - Jie Zhang, Rui Tao, Jun Du, Li-Rong Dai:
Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks. 215-228 - Xianke Wang, Bowen Tian, Weiming Yang, Wei Xu, Wenqing Cheng:
MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription. 229-241 - Yuanyuan Liu, Mittapalle Kiran Reddy, Nelly Penttilä, Tiina Ihalainen, Paavo Alku, Okko Räsänen:
Automatic Assessment of Parkinson's Disease Using Speech Representations of Phonation and Articulation. 242-255 - David Südholt, Alec Wright, Cumhur Erkut, Vesa Välimäki:
Pruning Deep Neural Network Models of Guitar Distortion Effects. 256-264 - Fangkai Jiao, Yangyang Guo, Minlie Huang, Liqiang Nie:
Enhanced Multi-Domain Dialogue State Tracker With Second-Order Slot Interactions. 265-276 - Hui Tian, Yiqin Qiu, Wojciech Mazurczyk, Haizhou Li, Zhenxing Qian:
STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams. 277-289 - Gopendra Vikram Singh, Mauajama Firdaus, Asif Ekbal, Pushpak Bhattacharyya:
EmoInt-Trans: A Multimodal Transformer for Identifying Emotions and Intents in Social Conversations. 290-300 - De Hu, Huaiwen Zhang, Feilong Bao, Rui Wang:
Distributed Sampling Rate Offset Estimation Over Acoustic Sensor Networks Based on Asynchronous Network Newton Optimization. 301-312 - David Diaz-Guerra, Antonio Miguel, José Ramón Beltrán:
Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs. 313-321 - Peiming Guo, Shen Huang, Peijie Jiang, Yueheng Sun, Meishan Zhang, Min Zhang:
Curriculum-Style Fine-Grained Adaption for Unsupervised Cross-Lingual Dependency Transfer. 322-332 - Naveen Kumar Desiraju, Simon Doclo, Markus Buck, Tobias Wolff:
Joint Online Estimation of Early and Late Residual Echo PSD for Residual Echo Suppression. 333-344 - Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator. 345-354 - Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis:
Meta-AF: Meta-Learning for Adaptive Filters. 355-370 - Yingwen Fu, Nankai Lin, Boyu Chen, Ziyu Yang, Shengyi Jiang:
Cross-Lingual Named Entity Recognition for Heterogenous Languages. 371-382 - Jun-You Wang, Jyh-Shing Roger Jang:
Training a Singing Transcription Model Using Connectionist Temporal Classification Loss and Cross-Entropy Loss. 383-396 - Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe, Jonathan Le Roux:
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency. 397-410 - Yu Li, Bojie Hu, Jian Liu, Yufeng Chen, Jinan Xu:
A Neighborhood Re-Ranking Model With Relation Constraint for Knowledge Graph Completion. 411-425 - Alessio Miaschi, Dominique Brunato, Felice Dell'Orletta, Giulia Venturi:
On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors. 426-438 - Rong Xiao, Yu Wan, Baosong Yang, Haibo Zhang, Huajin Tang, Derek F. Wong, Boxing Chen:
Towards Energy-Preserving Natural Language Understanding With Spiking Neural Networks. 439-447 - Juan Zhao, Tianrui Zong, Yong Xiang, Longxiang Gao, Guang Hua, Keshav Sood, Yushu Zhang:
SSVS-SSVD Based Desynchronization Attacks Resilient Watermarking Method for Stereo Signals. 448-461 - Qiquan Zhang, Xinyuan Qian, Zhaoheng Ni, Aaron Nicolson, Eliathamby Ambikairajah, Haizhou Li:
A Time-Frequency Attention Module for Neural Speech Enhancement. 462-475 - Binhong Xie, Yu Li, Hongyan Zhao, Lihu Pan, Enhui Wang:
A Cross-Attention Fusion Based Graph Convolution Auto-Encoder for Open Relation Extraction. 476-485 - Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang:
Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification. 486-499 - Xinglin Lyu, Junhui Li, Min Zhang, Chenchen Ding, Hideki Tanaka, Masao Utiyama:
Refining History for Future-Aware Neural Machine Translation. 500-512 - Mou Wang, Junqi Chen, Xiao-Lei Zhang, Susanto Rahardja:
End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus. 513-524 - Asier López-Zorrilla, María Inés Torres, Heriberto Cuayáhuitl:
Audio Embedding-Aware Dialogue Policy Learning. 525-538 - Xichen Shang, Chuxin Chen, Zipeng Chen, Qianli Ma:
Modularized Mutuality Network for Emotion-Cause Pair Extraction. 539-549 - Xinyuan Qian, Zhengdong Wang, Jiadong Wang, Guohui Guan, Haizhou Li:
Audio-Visual Cross-Attention Network for Robotic Speaker Tracking. 550-562 - Kristina Tesch, Timo Gerkmann:
Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement. 563-575 - Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach:
Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria. 576-589 - Davide Albertini, Alberto Bernardini, Federico Borra, Fabio Antonacci, Augusto Sarti:
Two-Stage Beamforming With Arbitrary Planar Arrays of Differential Microphone Array Units. 590-602 - Yi-Syuan Chen, Yun-Zhu Song, Hong-Han Shuai:
SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization. 603-618 - Yingying Xiao, Shanmou Chen, Qiangqiang Zhang, Dongyuan Lin, Minglin Shen, Junhui Qian, Shiyuan Wang:
Generalized Hyperbolic Tangent Based Random Fourier Conjugate Gradient Filter for Nonlinear Active Noise Control. 619-632 - Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Javier Tejedor:
Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing. 633-642 - Bin Gu, Wu Guo, Jie Zhang:
Memory Storable Network Based Feature Aggregation for Speaker Representation Learning. 643-655 - Takumi Abe, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Amplitude Matching for Multizone Sound Field Control. 656-669 - Mahdi Barhoush, Ahmed Hallawa, Arne Peine, Lukas Martin, Anke Schmeink:
Localization-Driven Speech Enhancement in Noisy Multi-Speaker Hospital Environments Using Deep Learning and Meta Learning. 670-683 - Herman Kamper:
Word Segmentation on Discovered Phone Units With Dynamic Programming and Self-Supervised Scoring. 684-694 - Changheng Li, Jorge Martínez, Richard Christian Hendriks:
Joint Maximum Likelihood Estimation of Microphone Array Parameters for a Reverberant Single Source Scenario. 695-705 - Shota Horiguchi, Shinji Watanabe, Paola García, Yuki Takashima, Yohei Kawaguchi:
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors. 706-720 - Ling He, Jia Fu, Yuanyuan Li, Xi Xiong, Jing Zhang:
WNSA-Net: An Axial-Attention-Based Network for Schizophrenia Detection Using Wideband and Narrowband Spectrograms. 721-733 - Anusha Prakash, Hema A. Murthy:
Exploring the Role of Language Families for Building Indic Speech Synthesisers. 734-747 - Mahdin Rohmatillah, Jen-Tzung Chien:
Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy. 748-761 - Shahram Ghorbani, John H. L. Hansen:
Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech. 762-774 - Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du:
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing. 775-788 - Nicki Holighaus, Günther Koliander, Clara Hollomey, Friedrich Pillichshammer:
Grid-Based Decimation for Wavelet Transforms With Stably Invertible Implementation. 789-801 - Weiwei Lin, Man-Wai Mak:
Robust Speaker Verification Using Deep Weight Space Ensemble. 802-812 - Lin Zhang, Xin Wang, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi:
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance. 813-825 - Jie Mei, Yufan Wang, Xinhui Tu, Ming Dong, Tingting He:
Incorporating BERT With Probability-Aware Gate for Spoken Language Understanding. 826-834 - Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki:
Mask-Based Neural Beamforming for Moving Speakers With Self-Attention-Based Tracking. 835-848 - Rongzhi Gu, Shi-Xiong Zhang, Yuexian Zou, Dong Yu:
Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation. 849-862 - Naotake Masuda, Daisuke Saito:
Improving Semi-Supervised Differentiable Synthesizer Sound Matching for Practical Applications. 863-875 - Erfan Loweimi, Zhengjun Yue, Peter Bell, Steve Renals, Zoran Cvetkovic:
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform. 876-890 - Bengt J. Borgström:
A Generative Approach to Condition-Aware Score Calibration for Speaker Verification. 891-901 - Irene Martín-Morató, Annamaria Mesaros:
Strong Labeling of Sound Events Using Crowdsourced Weak Labels and Annotator Competence Estimation. 902-914 - Wenzhao Zhu, Lei Luo, Jinwei Sun, Mads Græsbøll Christensen:
A New Virtual Tracking Sub-Algorithm Based Hybrid Active Control System for Narrowband Noise With Impulsive Interference. 915-926 - Thomas Deppisch, Sebastià V. Amengual Garí, Paul Calamia, Jens Ahrens:
Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses. 927-942 - Eloi Moliner, Vesa Välimäki:
BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks. 943-956 - Martin Jälmby, Filip Elvander, Toon van Waterschoot:
Low-Rank Room Impulse Response Estimation. 957-969 - Hong Liu, Yucheng Cai, Zhenru Lin, Zhijian Ou, Yi Huang, Junlan Feng:
Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems. 970-984 - De Hu, Qintuya Si, Rui Liu, Feilong Bao:
Distributed Sensor Selection for Speech Enhancement With Acoustic Sensor Networks. 985-999 - Yingke Zhu, Brian Mak:
Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification. 1000-1012 - Yuying Li, Yuchen Liu, Donald S. Williamson:
A Composite T60 Regression and Classification Approach for Speech Dereverberation. 1013-1023 - Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang, Helen Meng:
Meta-Generalization for Domain-Invariant Speaker Verification. 1024-1036 - Shutong Niu, Jun Du, Lei Sun, Yu Hu, Chin-Hui Lee:
QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization. 1037-1049 - Boyang Lyu, Chunxiao Fan, Yue Ming, Panzi Zhao, Nannan Hu:
En-HACN: Enhancing Hybrid Architecture With Fast Attention and Capsule Network for End-to-end Speech Recognition. 1050-1062 - Yang Liu, Haoqin Sun, Wenbo Guan, Yuqi Xia, Yongwei Li, Masashi Unoki, Zhen Zhao:
A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition. 1063-1074 - Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Weiqiang Zhang:
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning. 1075-1086 - Wei-Cheng Lin, Carlos Busso:
Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion. 1087-1099 - Achyut Mani Tripathi, Om Jee Pandey:
Divide and Distill: New Outlooks on Knowledge Distillation for Environmental Sound Classification. 1100-1113 - Hao Zhang, Ashutosh Pandey, DeLiang Wang:
Low-Latency Active Noise Control Using Attentive Recurrent Network. 1114-1123 - Avital Bross, Sharon Gannot:
Training-Based Multiple Source Tracking Using Manifold-Learning and Recursive Expectation-Maximization. 1124-1140 - Guimin Hu, Yi Zhao, Guangming Lu:
Emotion Prediction Oriented Method With Multiple Supervisions for Emotion-Cause Pair Extraction. 1141-1152 - Reza Mohsenipour, Daniel Massicotte, Wei-Ping Zhu:
PI Control of Loudspeakers Based on Linear Fractional Order Model. 1153-1162 - Tim Lübeck, Johannes M. Arend, Christoph Pörschmann:
Spatial Upsampling of Sparse Spherical Microphone Array Signals. 1163-1174 - Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Guinan Li, Shujie Hu, Xunying Liu:
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems. 1175-1190 - Hongsheng Zhang, Jizhang Gan, Ting Liu, Kui Huang, Hong Yang:
Coefficients-Switched Normalized Least-Mean- Squares Adaption in Echo Canceler of Sparse-Echo-Path. 1191-1199 - Eric Guizzo, Tillman Weyde, Simone Scardapane, Danilo Comminiello:
Learning Speech Emotion Representations in the Quaternion Domain. 1200-1212 - Jiaqi Bai, Ze Yang, Jian Yang, Hongcheng Guo, Zhoujun Li:
KINet: Incorporating Relevant Facts Into Knowledge-Grounded Dialog Generation. 1213-1222 - Haiquan Zhao, Yuan Gao, Yingying Zhu:
Robust Subband Adaptive Filter Algorithms-Based Mixture Correntropy and Application to Acoustic Echo Cancellation. 1223-1233 - Chen Zhang, Luis Fernando D'Haro, Qiquan Zhang, Thomas Friedrichs, Haizhou Li:
PoE: A Panel of Experts for Generalized Automatic Dialogue Assessment. 1234-1250 - Qing Wang, Jun Du, Huaxin Wu, Jia Pan, Feng Ma, Chin-Hui Lee:
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection. 1251-1264 - Yingwen Fu, Nankai Lin, Xiaohui Yu, Shengyi Jiang:
Self-Training With Double Selectors for Low-Resource Named Entity Recognition. 1265-1275 - Kilian Schulze-Forster, Gaël Richard, Liam Kelley, Clement S. J. Doire, Roland Badeau:
Unsupervised Music Source Separation Using Differentiable Parametric Source Models. 1276-1289 - Yinggang Liu, Hong Fu, Ying Wei, Hanbing Zhang:
Sound Event Classification Based on Frequency-Energy Feature Representation and Two-Stage Data Dimension Reduction. 1290-1304 - Ege Erdem, Zoran Cvetkovic, Hüseyin Hacihabiboglu:
3D Perceptual Soundfield Reconstruction via Virtual Microphone Synthesis. 1305-1317 - Dong-Yuan Shi, Woon-Seng Gan, Bhan Lam, Xiaoyi Shen:
A Frequency-Domain Output-Constrained Active Noise Control Algorithm Based on an Intuitive Circulant Convolutional Penalty Factor. 1318-1332 - Muhammed Zahid Ozturk, Chenshu Wu, Beibei Wang, Min Wu, K. J. Ray Liu:
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System. 1333-1347 - Jianwei Zhang, Julie Liss, Suren Jayasuriya, Visar Berisha:
Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection. 1348-1359 - Ashutosh Pandey, DeLiang Wang:
Attentive Training: A New Training Framework for Speech Enhancement. 1360-1370 - Hirofumi Inaguma, Tatsuya Kawahara:
Alignment Knowledge Distillation for Online Streaming Attention-Based Speech Recognition. 1371-1385 - Mittapalle Kiran Reddy, Paavo Alku:
Exemplar-Based Sparse Representations for Detection of Parkinson's Disease From Speech. 1386-1396 - Shunsuke Kita, Yoshinobu Kajikawa:
Sound Source Localization Inside a Structure Under Semi-Supervised Conditions. 1397-1408 - Guowei Wu, Shipei Liu, Xiaoya Fan:
The Power of Fragmentation: A Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation. 1409-1420 - Xueqin Luo, Gongping Huang, Jilu Jin, Jingdong Chen, Jacob Benesty, Wen Zhang, Mengyao Zhu, Chunjian Li:
Design of Maximum Directivity Beamformers With Linear Acoustic Vector Sensor Arrays. 1421-1435