default search action
19th Interspeech 2018: Hyderabad, India
- B. Yegnanarayana:
19th Annual Conference of the International Speech Communication Association, Interspeech 2018, Hyderabad, India, September 2-6, 2018. ISCA 2018
ISCA Medal Talk
- Bishnu S. Atal:
From Vocoders to Code-Excited Linear Prediction: Learning How We Hear What We Hear. 1
End-to-End Speech Recognition
- Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa, Marc Delcroix:
Semi-Supervised End-to-End Speech Recognition. 2-6 - Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney:
Improved Training of End-to-end Attention Models for Speech Recognition. 7-11 - Hossein Hadian, Hossein Sameti, Daniel Povey, Sanjeev Khudanpur:
End-to-end Speech Recognition Using Lattice-free MMI. 12-16 - Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini, Shih-Chii Liu:
Multi-channel Attention for End-to-End Speech Recognition. 17-21 - Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato de Mori, Yoshua Bengio:
Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition. 22-26 - Ruoming Pang, Tara N. Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang, Chung-Cheng Chiu:
Compression of End-to-End Models. 27-31
Prosody Modeling and Generation
- Zack Hodari, Oliver Watts, Srikanth Ronanki, Simon King:
Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data. 32-36 - Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa:
Investigating Accuracy of Pitch-accent Annotations in Neural Network-based Speech Synthesis and Denoising Effects. 37-41 - Guan-Ting Liou, Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen:
An Exploration of Local Speaking Rate Variations in Mandarin Read Speech. 42-46 - Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li:
BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End. 47-51 - Berrak Sisman, Haizhou Li:
Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion. 52-56 - Rui Liu, Feilong Bao, Guanglai Gao, Hui Zhang, Yonghe Wang:
Improving Mongolian Phrase Break Prediction by Using Syllable and Morphological Embeddings with BiLSTM Model. 57-61
Speaker Verification I
- Lanhua You, Wu Guo, Yan Song, Sheng Zhang:
Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification. 62-66 - Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu:
Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings. 67-71 - Anna Silnova, Niko Brümmer, Daniel Garcia-Romero, David Snyder, Lukás Burget:
Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors. 72-76 - Massimiliano Todisco, Héctor Delgado, Kong-Aik Lee, Md. Sahidullah, Nicholas W. D. Evans, Tomi Kinnunen, Junichi Yamagishi:
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion. 77-81 - Luciana Ferrer, Mitchell McLaren:
A Generalization of PLDA for Joint Modeling of Speaker Identity and Multiple Nuisance Conditions. 82-86 - Nanxin Chen, Jesús Villalba, Najim Dehak:
An Investigation of Non-linear i-vectors for Speaker Verification. 87-91
Spoken Term Detection
- Dhananjay Ram, Lesly Miculicich, Hervé Bourlard:
CNN Based Query by Example Spoken Term Detection. 92-96 - Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li:
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search. 97-101 - Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai:
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection. 102-106 - Wei Li, Brian Mak:
Fast Derivation of Cross-lingual Document Vectors from Self-attentive Neural Machine Translation Model. 107-111 - Laxmi Pandey, Karan Nathwani:
LSTM Based Attentive Fusion of Spectral and Prosodic Information for Keyword Spotting in Hindi Language. 112-116 - Ravi Shankar, Vikram C. M., S. R. Mahadeva Prasanna:
Spoken Keyword Detection Using Joint DTW-CNN. 117-121
The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis, Stefanos Zafeiriou:
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats. 122-126 - Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng, Taufiq Hasan:
An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification. 127-131 - Mehmet Ali Tugtekin Turan, Engin Erzin:
Monitoring Infant's Emotional Cry in Domestic Environments Using the Capsule Network Architecture. 132-136 - Mark A. Huckvale:
Neural Network Architecture That Combines Temporal and Summative Features for Infant Cry Classification in the Interspeech 2018 Computational Paralinguistics Challenge. 137-141 - Zixing Zhang, Jing Han, Kun Qian, Björn W. Schuller:
Evolving Learning for Analysing Mood-Related Infant Vocalisation. 142-146 - Johannes Wagner, Dominik Schiller, Andreas Seiderer, Elisabeth André:
Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant? 147-151 - Danqing Luo, Yuexian Zou, Dongyan Huang:
Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition. 152-156 - Soo Jin Park, Amber Afshan, Zhi Ming Chua, Abeer Alwan:
Using Voice Quality Supervectors for Affect Identification. 157-161 - Dengke Tang, Junlin Zeng, Ming Li:
An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals. 162-166
Show and Tell 1
- Alexander Koller, Timo Baumann, Arne Köhn:
DialogOS: Simple and Extensible Dialogue Modeling. 167-168 - Franck Dernoncourt, Trung Bui, Walter Chang:
A Framework for Speech Recognition Benchmarking. 169-170 - Takayuki Arai:
Flexible Tongue Housed in a Static Model of the Vocal Tract With Jaws, Lips and Teeth. 171-172 - Lani Mathew, K. Gopakumar:
Voice Analysis Using Acoustic and Throat Microphones for Speech Therapy. 173-174 - Manny Rayner, Nikos Tsourakis, Jan Stanek:
A Robust Context-Dependent Speech-to-Speech Phraselator Toolkit for Alexa. 175-176
Speech Segments and Voice Quality
- RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Bayya Yegnanarayana:
Discriminating Nasals and Approximants in English Language Using Zero Time Windowing. 177-181 - Phil Howson, Alexei Kochetov:
Gestural Lenition of Rhotics Captures Variation in Brazilian Portuguese. 182-186 - RaviShankar Prasad, Bayya Yegnanarayana:
Identification and Classification of Fricatives in Speech Using Zero Time Windowing Method. 187-191 - Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman, Jonathan Wright, Jiahong Yuan, Juhong Zhan, Yuqing Zhan:
GlobalTIMIT: Acoustic-Phonetic Datasets for the World's Languages. 192-196 - Anne Hermes, Doris Mücke, Bastian Auris, Rachid Ridouane:
Structural Effects on Properties of Consonantal Gestures in Tashlhiyt. 197-201 - Alexei Kochetov, Matthew Faytak, Kiranpreet Nara:
The Retroflex-dental Contrast in Punjabi Stops and Nasals: A Principal Component Analysis of Ultrasound Images. 202-206 - Yang Yue, Fang Hu:
Vowels and Diphthongs in Hangzhou Wu Chinese Dialect. 207-211 - Mahesh M, Jeena J. Prakash, Hema A. Murthy:
Resyllabification in Indian Languages and Its Implications in Text-to-speech Systems. 212-216 - Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl:
Voice Source Contribution to Prominence Perception: Rd Implementation. 217-221 - Christer Gobl, Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide:
On the Relationship between Glottal Pulse Shape and Its Spectrum: Correlations of Open Quotient, Pulse Skew and Peak Flow with Source Harmonic Amplitudes. 222-226 - Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo Fernández:
The Individual and the System: Assessing the Stability of the Output of a Semi-automatic Forensic Voice Comparison System. 227-231 - Sudarsana Reddy Kadiri, Bayya Yegnanarayana:
Breathy to Tense Voice Discrimination using Zero-Time Windowing Cepstral Coefficients (ZTWCCs). 232-236 - Pamir Gogoi, Sishir Kalita, Parismita Gogoi, Ratree Wayland, Priyankoo Sarmah, S. R. Mahadeva Prasanna:
Analysis of Breathiness in Contextual Vowel of Voiceless Nasals in Mizo. 237-241
Speaker State and Trait
- Yijia Xu, Mark Hasegawa-Johnson, Nancy McElwain:
Infant Emotional Outbursts Detection in Infant-parent Spoken Interactions. 242-246 - Jaejin Cho, Raghavendra Pappagari, Purva Kulkarni, Jesús Villalba, Yishay Carmiel, Najim Dehak:
Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts. 247-251 - Srinivas Parthasarathy, Carlos Busso:
Preference-Learning with Qualitative Agreement for Sentence Level Emotional Annotations. 252-256 - Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir, Julien Epps:
Transfer Learning for Improving Speech Emotion Classification Accuracy. 257-261 - Patrick Meyer, Eric Buschermöhle, Tim Fingscheidt:
What Do Classifiers Actually Learn? a Case Study on Emotion Recognition Datasets. 262-266 - Eva-Maria Rathner, Yannik Terhorst, Nicholas Cummins, Björn W. Schuller, Harald Baumeister:
State of Mind: Classification through Self-reported Affect and Word Use in Speech. 267-271 - Ziping Zhao, Yu Zheng, Zixing Zhang, Haishuai Wang, Yiqin Zhao, Chao Li:
Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition. 272-276 - Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur, Najim Dehak:
End-to-end Deep Neural Network Age Estimation. 277-281 - Rajat Hebbar, Krishna Somandepalli, Shrikanth S. Narayanan:
Improving Gender Identification in Movie Audio Using Cross-Domain Data. 282-286 - Selen Hande Kabil, Hannah Muckenhirn, Mathew Magimai-Doss:
On Learning to Identify Genders from Raw Speech Signal Using CNNs. 287-291 - Jilt Sebastian, Manoj Kumar, Pavan Kumar D. S., Mathew Magimai-Doss, Hema A. Murthy, Shrikanth S. Narayanan:
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech. 292-296 - James R. Williamson, Thomas F. Quatieri, Adam C. Lammert, Katherine Mitchell, Katherine Finkelstein, Nicole Ekon, Caitlin Dillon, Robert Kenefick, Kristin Heaton:
The Effect of Exposure to High Altitude and Heat on Speech Articulatory Coordination. 297-301
Deep Learning for Source Separation and Pitch Tracking
- Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su, Dong Yu:
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation. 302-306 - Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian, Dong Yu:
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures. 307-311 - Weipeng He, Petr Motlícek, Jean-Marc Odobez:
Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network. 312-316 - Shuai Yang, Zhiyong Wu, Binbin Shen, Helen Meng:
Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method. 317-321 - Zhong-Qiu Wang, Xueliang Zhang, DeLiang Wang:
Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks. 322-326 - Akihiro Kato, Tomi Kinnunen:
Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks. 327-331 - Paul Magron, Konstantinos Drossos, Stylianos Ioannis Mimilakis, Tuomas Virtanen:
Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation. 332-336 - Kanru Hua:
Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors. 337-341 - Yi Luo, Nima Mesgarani:
Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network. 342-346 - Rajath Kumar, Yi Luo, Nima Mesgarani:
Music Source Activity Detection and Separation Using Deep Attractor Network. 347-351 - Longfei Yang, Yanlu Xie, Jinsong Zhang:
Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention. 352-356
Acoustic Analysis-Synthesis of Speech Disorders
- Rob van Son, Catherine Middag, Kris Demuynck:
Vowel Space as a Tool to Evaluate Articulation Problems. 357-361 - Véronique Delvaux, Kathy Huet, Myriam Piccaluga, Sophie van Malderen, Bernard Harmegnies:
Towards a Better Characterization of Parkinsonian Speech: A Multidimensional Acoustic Study. 362-366 - Sishir Kalita, S. R. Mahadeva Prasanna, Samarendra Dandapat:
Self-similarity Matrix Based Intelligibility Assessment of Cleft Lip and Palate Speech. 367-371 - Akhilesh Kumar Dubey, S. R. Mahadeva Prasanna, Samarendra Dandapat:
Pitch-Adaptive Front-end Feature for Hypernasality Detection. 372-376 - Raquel Norel, Mary Pietrowicz, Carla Agurto, Shay Rishoni, Guillermo A. Cecchi:
Detection of Amyotrophic Lateral Sclerosis (ALS) via Acoustic Analysis. 377-381 - Vikram C. M., S. R. Mahadeva Prasanna, Ajish K. Abraham, Pushpavathi M, Girish K. S:
Detection of Glottal Activity Errors in Production of Stop Consonants in Children with Cleft Lip and Palate. 382-386
ASR Systems and Technologies
- Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates:
Cold Fusion: Training Seq2Seq Models Together with Language Models. 387-391 - Kazuki Irie, Zhihong Lei, Liuhui Deng, Ralf Schlüter, Hermann Ney:
Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs. 392-395 - Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel:
Subword and Crossword Units for CTC Acoustic Models. 396-400 - Tomohiro Tanaka, Ryo Masumura, Hirokazu Masataki, Yushi Aono:
Neural Error Corrective Language Models for Automatic Speech Recognition. 401-405 - Mohammad Sadegh Rasooli, Sarangarajan Parthasarathy:
Entity-Aware Language Model as an Unsupervised Reranker. 406-410 - Iksoo Choi, Jinhwan Park, Wonyong Sung:
Character-level Language Modeling with Gated Hierarchical Recurrent Neural Networks. 411-415
Deception, Personality, and Culture Attribute
- Sarah Ita Levitan, Angel Maredia, Julia Hirschberg:
Acoustic-Prosodic Indicators of Deception and Trust in Interview Dialogues. 416-420 - Guozhen An, Sarah Ita Levitan, Julia Hirschberg, Rivka Levitan:
Deep Personality Recognition for Deception Detection. 421-425 - Hansjörg Mixdorff, Albert Rilliard, Tan Lee, Matthew K. H. Ma, Angelika Hönemann:
Cross-cultural (A)symmetries in Audio-visual Attitude Perception. 426-430 - Fasih Haider, Fahim A. Salim, Owen Conlan, Saturnino Luz:
An Active Feature Transformation Method for Attitude Recognition of Video Bloggers. 431-435 - Fu-Sheng Tsai, Hao-Chun Yang, Wei-Wen Chang, Chi-Chun Lee:
Automatic Assessment of Individual Culture Attribute of Power Distance Using a Social Context-Enhanced Prosodic Network Representation. 436-440 - Sudarsana Reddy Kadiri, Bayya Yegnanarayana:
Analysis and Detection of Phonation Modes in Singing Voice using Excitation Source Features and Single Frequency Filtering Cepstral Coefficients (SFFCC). 441-445
Automatic Detection and Recognition of Voice and Speech Disorders
- Huiyi Wu, John J. Soraghan, Anja Lowit, Gaetano Di Caterina:
A Deep Learning Method for Pathological Voice Detection Using Convolutional Deep Belief Networks. 446-450 - Chitralekha Bhat, Biswajit Das, Bhavik Vachhani, Sunil Kumar Kopparapu:
Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder. 451-455 - Juan Camilo Vásquez-Correa, Tomás Arias-Vergara, Juan Rafael Orozco-Arroyave, Elmar Nöth:
A Multitask Learning Approach to Assess the Dysarthria Severity in Patients with Parkinson's Disease. 456-460 - Jason Lilley, Erin L. Crowgey, H. Timothy Bunnell:
The Use of Machine Learning and Phonetic Endophenotypes to Discover Genetic Variants Associated with Speech Sound Disorder. 461-465 - Meredith Moore, Hemanth Venkateswara, Sethuraman Panchanathan:
Whistle-blowing ASRs: Evaluating the Need for More Inclusive Speech Recognition Systems. 466-470 - Bhavik Vachhani, Chitralekha Bhat, Sunil Kumar Kopparapu:
Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition. 471-475
Voice Conversion
- Shaojin Ding, Guanlong Zhao, Christopher Liberatore, Ricardo Gutierrez-Osuna:
Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function. 476-480 - Shaojin Ding, Christopher Liberatore, Ricardo Gutierrez-Osuna:
Learning Structured Dictionaries for Exemplar-based Voice Conversion. 481-485 - Yu-Huai Peng, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang:
Exemplar-Based Spectral Detail Compensation for Voice Conversion. 486-490 - G. Nisha Meenakshi, Prasanta Kumar Ghosh:
Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs. 491-495 - Songxiang Liu, Jinghua Zhong, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng:
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance. 496-500 - Ju-Chieh Chou, Cheng-chieh Yeh, Hung-yi Lee, Lin-Shan Lee:
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations. 501-505
The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
- Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani, John Kane:
Attention-based Sequence Classification for Affect Detection. 506-510 - Zafi Sherhan Syed, Julien Schroeter, Kirill A. Sidorov, A. David Marshall:
Computational Paralinguistics: Automatic Assessment of Emotions, Mood and Behavioural State from Acoustics of Speech. 511-515 - Sai Krishna Rallabandi, Bhavya Karki, Carla Viegas, Eric Nyberg, Alan W. Black:
Investigating Utterance Level Representations for Detecting Intent from Acoustics. 516-520 - Heysem Kaya, Dmitrii Fedotov, Ali Yesilkanat, Oxana Verkholyak, Yang Zhang, Alexey Karpov:
LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition. 521-525 - Bogdan Vlasenko, Jilt Sebastian, Pavan Kumar D. S., Mathew Magimai-Doss:
Implementing Fusion Techniques for the Classification of Paralinguistic Information. 526-530 - Gábor Gosztolya, Tamás Grósz, László Tóth:
General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats. 531-535 - Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li, Chi-Chun Lee:
Self-Assessed Affect Recognition Using Fusion of Attentional BLSTM and Static Acoustic Features. 536-540 - Claude Montacié, Marie-José Caraty:
Vocalic, Lexical and Prosodic Cues for the INTERSPEECH 2018 Self-Assessed Affect Challenge. 541-545
Show and Tell 2
- Anand P. A, Chiranjeevi Yarra, N. K. Kausthubha, Prasanta Kumar Ghosh:
Intonation tutor by SPIRE (In-SPIRE): An Online Tool for an Automatic Feedback to the Second Language Learners in Learning Intonation. 546-547 - Keelan Evanini, Veronika Timpe-Laughlin, Eugene Tsuprun, Ian Blood, Jeremy Lee, James V. Bruno, Vikram Ramanarayanan, Patrick L. Lange, David Suendermann-Oeft:
Game-based Spoken Dialog Language Learning Applications for Young Students. 548-549