default search action
18th Interspeech 2017: Stockholm, Sweden
- Francisco Lacerda:
18th Annual Conference of the International Speech Communication Association, Interspeech 2017, Stockholm, Sweden, August 20-24, 2017. ISCA 2017
ISCA Medal 2017 Ceremony
- Haizhou Li:
ISCA Medal for Scientific Achievement. 1
Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1
- Tomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Junichi Yamagishi, Kong-Aik Lee:
The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection. 2-6 - Roberto Font, Juan M. Espín, María José Cano:
Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge. 7-11 - Hemant A. Patil, Madhu R. Kamble, Tanvina B. Patel, Meet H. Soni:
Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection. 12-16 - Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, Ming Li:
Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion. 17-21 - Sarfaraz Jelil, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha:
Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features. 22-26 - Marcin Witkowski, Stanislaw Kacprzak, Piotr Zelasko, Konrad Kowalczyk, Jakub Galka:
Audio Replay Attack Detection Using High-Frequency Features. 27-31 - Xianliang Wang, Yanhong Xiao, Xuan Zhu:
Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing. 32-36
Special Session: Speech Technology for Code-Switching in Multilingual Communities
- Emre Yilmaz, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel, David A. van Leeuwen:
Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech. 37-41 - Emre Yilmaz, Henk van den Heuvel, David A. van Leeuwen:
Exploiting Untranscribed Broadcast Data for Improved Code-Switching Detection. 42-46 - Vikram Ramanarayanan, David Suendermann-Oeft:
Jee haan, I'd like both, por favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine Dialog. 47-51 - Sai Krishna Rallabandi, Alan W. Black:
On Building Mixed Lingual Speech Synthesis Systems. 52-56 - Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Sunayana Sitaram, Alan W. Black:
Speech Synthesis for Mixed-Language Navigation Instructions. 57-61 - Djegdjiga Amazouz, Martine Adda-Decker, Lori Lamel:
Addressing Code-Switching in French/Algerian Arabic Speech. 62-66 - Gualberto A. Guzmán, Joseph Ricard, Jacqueline Serigos, Barbara E. Bullock, Almeida Jacqueline Toribio:
Metrics for Modeling Code-Switching Across Corpora. 67-71 - Ewald van der Westhuizen, Thomas Niesler:
Synthesising isiZulu-English Code-Switch Bigrams Using Word Embeddings. 72-76 - Victor Soto, Julia Hirschberg:
Crowdsourcing Universal Part-of-Speech Tags for Code-Switching. 77-81
Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2
- Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, Vadim Shchemelinin:
Audio Replay Attack Detection with Deep Learning Frameworks. 82-86 - Zhe Ji, Zhi-Yi Li, Peng Li, MaoBo An, Shengxiang Gao, Dan Wu, Faru Zhao:
Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017. 87-91 - Lantian Li, Yixiang Chen, Dong Wang, Thomas Fang Zheng:
A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification. 92-96 - Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland:
Replay Attack Detection Using DNN for Channel Discrimination. 97-101 - Zhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu:
ResNet and Model Fusion for Automatic Spoofing Detection. 102-106 - K. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala:
SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017. 107-111
Conversational Telephone Speech Recognition
- William Hartmann, Roger Hsiao, Tim Ng, Jeff Z. Ma, Francis Keith, Man-Hung Siu:
Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features. 112-116 - Jeremy Heng Meng Wong, Mark J. F. Gales:
Student-Teacher Training with Diverse Decision Tree Ensembles. 117-121 - Xiaodong Cui, Vaibhava Goel, George Saon:
Embedding-Based Speaker Adaptive Training of Deep Neural Networks. 122-126 - Jeff Z. Ma, Francis Keith, Tim Ng, Man-Hung Siu, Owen Kimball:
Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer. 127-131 - George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall:
English Conversational Telephone Speech Recognition by Humans and Machines. 132-136 - Andreas Stolcke, Jasha Droppo:
Comparing Human and Machine Errors in Conversational Speech Transcription. 137-141
Multimodal Paralinguistics
- Volha Petukhova, Manoj Raju, Harry Bunt:
Multimodal Markers of Persuasive Speech: Designing a Virtual Debate Coach. 142-146 - Daniel Bone, Julia Mertens, Emily Zane, Sungbok Lee, Shrikanth S. Narayanan, Ruth B. Grossman:
Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum Disorder. 147-151 - Alec Burmania, Carlos Busso:
A Stepwise Analysis of Aggregated Crowdsourced Labels Describing Multimodal Emotional Behaviors. 152-156 - Gaurav Fotedar, Prasanta Kumar Ghosh:
An Information Theoretic Analysis of the Temporal Synchrony Between Head Gestures and Prosodic Patterns in Spontaneous Speech. 157-161 - Dong-Yan Huang, Wan Ding, Mingyu Xu, Huaiping Ming, Minghui Dong, Xinguo Yu, Haizhou Li:
Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression Techniques. 162-165 - Marion Dohen, Benjamin Roustan:
Co-Production of Speech and Pointing Gestures in Clear and Perturbed Interactive Tasks: Multimodal Designation Strategies. 166-170
Dereverberation, Echo Cancellation and Speech
- Peter Guzewich, Stephen A. Zahorian:
Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing. 171-175 - Philipp Bulling, Klaus Linhard, Arthur Wolf, Gerhard Schmidt:
Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System Distance. 176-180 - Jan Franzen, Tim Fingscheidt:
A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) Systems. 181-185 - Dongmei Wang, John H. L. Hansen:
Speech Enhancement Based on Harmonic Estimation Combined with MMSE to Improve Speech Intelligibility for Cochlear Implant Recipients. 186-190 - David Ayllón, Roberto Gil-Pita, Manuel Rosa-Zurera:
Improving Speech Intelligibility in Binaural Hearing Aids by Estimating a Time-Frequency Mask with a Weighted Least Squares Classifier. 191-195 - Tsung-Chen Wu, Tai-Shih Chi, Chia-Fone Lee:
Simulations of High-Frequency Vocoder on Mandarin Speech Recognition for Acoustic Hearing Preserved Cochlear Implant. 196-200
Acoustic and Articulatory Phonetics
- Zainab Hermes, Marissa S. Barlaz, Ryan Shosted, Zhi-Pei Liang, Bradley P. Sutton:
Phonetic Correlates of Pharyngeal and Pharyngealized Consonants in Saudi, Lebanese, and Jordanian Arabic: An rt-MRI Study. 201-205 - Benjamin Elie, Yves Laprie:
Glottal Opening and Strategies of Production of Fricatives. 206-209 - Mohamed Yassine Frej, Christopher Carignan, Catherine T. Best:
Acoustics and Articulation of Medial versus Final Coronal Stop Gemination Contrasts in Moroccan Arabic. 210-214 - Giuseppina Turco, Karim Shoul, Rachid Ridouane:
How are Four-Level Length Distinctions Produced? Evidence from Moroccan Arabic. 215-218 - Caroline Jones, Katherine Demuth, Weicong Li, Andre Almeida:
Vowels in the Barunga Variety of North Australian Kriol. 219-223 - Indranil Dutta, Irfan S., Pamir Gogoi, Priyankoo Sarmah:
Nature of Contrast and Coarticulation: Evidence from Mizo Tones and Assamese Vowel Harmony. 224-228
Multimodal and Articulatory Synthesis
- João Paulo Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell:
The Influence of Synthetic Voice on the Evaluation of a Virtual Character. 229-233 - Amelia Jane Gully, Takenori Yoshimura, Damian T. Murphy, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network. 234-238 - Sébastien Le Maguer, Ingmar Steiner, Alexander Hewer:
An HMM/DNN Comparison for Synchronized Text-to-Speech and Tongue Motion Synthesis. 239-243 - Rachel Alexander, Tanner Sorensen, Asterios Toutios, Shrikanth S. Narayanan:
VCV Synthesis Using Task Dynamics to Animate a Factor-Based Articulatory Model. 244-248 - Joseph Mendelson, Matthew P. Aylett:
Beyond the Listening Test: An Interactive Approach to TTS Evaluation. 249-253 - Beiming Cao, Myung Jong Kim, Jan P. H. van Santen, Ted Mau, Jun Wang:
Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis. 254-258
Neural Networks for Language Modeling
- Min Ma, Michael Nirschl, Fadi Biadsy, Shankar Kumar:
Approaches for Neural-Network Language Model Adaptation. 259-263 - Youssef Oualil, Dietrich Klakow:
A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models. 264-268 - Xie Chen, Anton Ragni, Xunying Liu, Mark J. F. Gales:
Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition. 269-273 - Yinghui Huang, Abhinav Sethy, Bhuvana Ramabhadran:
Fast Neural Network Language Model Lookups at N-Gram Speeds. 274-278 - Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon:
Empirical Exploration of Novel Architectures and Objectives for Language Models. 279-283 - Karel Benes, Murali Karthick Baskar, Lukás Burget:
Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward Networks. 284-288
Pathological Speech and Language
- Amir Hossein Poorjam, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen:
Dominant Distortion Classification for Pre-Processing of Vowels in Remote Biomedical Voice Analysis. 289-293 - Duc Le, Keli Licata, Emily Mower Provost:
Automatic Paraphasia Detection from Aphasic Speech: A Preliminary Study. 294-298 - Nicanor García, Juan Rafael Orozco-Arroyave, Luis Fernando D'Haro, Najim Dehak, Elmar Nöth:
Evaluation of the Neurological State of People with Parkinson's Disease Using i-Vectors. 299-303 - Yu-Ren Chien, Michal Borský, Jón Guðnason:
Objective Severity Assessment from Disordered Voice Using Estimated Glottal Airflow. 304-308 - Florian B. Pokorny, Björn W. Schuller, Peter B. Marschik, Raymond Brueckner, Pär Nyström, Nicholas Cummins, Sven Bölte, Christa Einspieler, Terje Falck-Ytter:
Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-Based Approach. 309-313 - Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth:
Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson's Disease. 314-318
Speech Analysis and Representation 1
- Linxue Bai, Peter Jancovic, Martin J. Russell, Philip Weber, Stephen M. Houghton:
Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs. 319-323 - Siyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Ngoc Le:
An Investigation of Crowd Speech for Room Occupancy Estimation. 324-328 - Karthika Vijayan, Jitendra Kumar Dhiman, Chandra Sekhar Seelamantula:
Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech Signals. 329-333 - Alexsandro R. Meireles, Antônio R. M. Simões, Antonio Celso Ribeiro, Beatriz Raposo de Medeiros:
Musical Speech: A New Methodology for Transcribing Speech Prosody. 334-338 - K. S. Nataraj, Prem C. Pandey, Hirak Dasgupta:
Estimation of Place of Articulation of Fricatives from Spectral Characteristics for Speech Training. 339-343 - Tom Bäckström:
Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source. 344-348
Perception of Dialects and L2
- Sucheta Ghosh, Camille Fauth, Yves Laprie, Aghilas Sini:
End-to-End Acoustic Feedback in Language Learning for Correcting Devoiced French Final-Fricatives. 349-353 - Ewa Jacewicz, Robert Allen Fox:
Dialect Perception by Older Children. 354-358 - Kiyoko Yoneyama, Mafuyu Kitahara, Keiichi Tajima:
Perception of Non-Contrastive Variations in American English by Japanese Learners: Flaps are Less Favored Than Stops. 359-363 - Lieke van Maastricht, Tim Zee, Emiel Krahmer, Marc Swerts:
L1 Perceptions of L2 Prosody: The Interplay Between Intonation, Rhythm, and Speech Rate and Their Contribution to Accentedness and Comprehensibility. 364-368 - Izumi Takiguchi:
Effects of Pitch Fall and L1 on Vowel Length Identification in L2 Japanese. 369-373 - Yuanyuan Zhang, Hongwei Ding:
A Preliminary Study of Prosodic Disambiguation by Chinese EFL Learners. 374-378
Far-field Speech Recognition
- Chanwoo Kim, Ananya Misra, Kean K. Chin, Thad Hughes, Arun Narayanan, Tara N. Sainath, Michiel Bacchiani:
Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home. 379-383 - Keisuke Kinoshita, Marc Delcroix, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani:
Neural Network-Based Spectrum Estimation for Online WPE Dereverberation. 384-388 - Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven J. Rennie:
Factorial Modeling for Effective Suppression of Directional Noise. 389-393 - Yanhui Tu, Jun Du, Lei Sun, Feng Ma, Chin-Hui Lee:
On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones. 394-398 - Bo Li, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean K. Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon:
Acoustic Modeling for Google Home. 399-403 - Seyedmahdad Mirsamadi, John H. L. Hansen:
On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech Recognition. 404-408
Speech Analysis and Representation 2
- Masanori Morise, Genta Miyashita, Kenji Ozawa:
Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System. 409-413 - Erfan Loweimi, Jon Barker, Oscar Saz-Torralba, Thomas Hain:
Robust Source-Filter Separation of Speech Signal in the Phase Domain. 414-418 - Simon Stone, Peter Steiner, Peter Birkholz:
A Time-Warping Pitch Tracking Algorithm Considering Fast f0 Changes. 419-423 - Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda:
A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and fo Estimation. 424-428 - Avinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan:
Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments. 429-433 - Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh:
Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech Synthesis. 434-438 - Chia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang, Jeih-Weih Hung, Ying-Hui Lai, Hsin-Min Wang, Yu Tsao:
Wavelet Speech Enhancement Based on Robust Principal Component Analysis. 439-443 - Bidisha Sharma, S. R. Mahadeva Prasanna:
Vowel Onset Point Detection Using Sonority Information. 444-448 - Unto K. Laine:
Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies. 449-453 - Christian Kroos, Mark D. Plumbley:
Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural Networks. 454-458
Speech and Audio Segmentation and Classification 2
- Jia Dai, Wei Xue, Wenju Liu:
Multilingual i-Vector Based Statistical Modeling for Music Genre Classification. 459-463 - Banriskhem K. Khonglah, K. T. Deepak, S. R. Mahadeva Prasanna:
Indoor/Outdoor Audio Classification Using Foreground Speech Segmentation. 464-468 - Jinxi Guo, Ning Xu, Li-Jia Li, Abeer Alwan:
Attention Based CLDNNs for Short-Duration Acoustic Scene Classification. 469-473 - Xianjun Xia, Roberto Togneri, Ferdous Ahmed Sohel, David Huang:
Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event Detection. 474-478 - Inseon Jang, Chunghyun Ahn, Jeongil Seo, Younseon Jang:
Enhanced Feature Extraction for Speech Detection in Media Audio. 479-483 - Sukanya Sonowal, Tushar Sandhan, In Kyu Choi, Nam Soo Kim:
Audio Classification Using Class-Specific Learned Descriptors. 484-487 - Janek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, Bhiksha Raj:
Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery. 488-492 - Matthias Zöhrer, Franz Pernkopf:
Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural Networks. 493-497 - Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, Morgan Sonderegger:
Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi. 498-502 - G. Nisha Meenakshi, Prasanta Kumar Ghosh:
A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the 'Color' of Whispered Phonemes and Deep Neural Network. 503-507
Search, Computational Strategies and Language Modeling
- Ian Williams, Petar S. Aleksic:
Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition. 508-512 - Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel:
Comparison of Decoding Strategies for CTC Acoustic Models. 513-517 - Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur:
Phone Duration Modeling for LVCSR Using Neural Networks. 518-522 - Jan Chorowski, Navdeep Jaitly:
Towards Better Decoding and Language Model Integration in Sequence to Sequence Models. 523-527 - Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu:
Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling. 528-532 - Xu Xiang, Yanmin Qian, Kai Yu:
Binary Deep Neural Networks for Speech Recognition. 533-537 - Akshay Chandrashekaran, Ian R. Lane:
Hierarchical Constrained Bayesian Optimization for Feature, Acoustic Model and Decoder Parameter Optimization. 538-542 - Shohei Toyama, Daisuke Saito, Nobuaki Minematsu:
Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition. 543-547 - Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas C. Raykar, Lili Kotlerman, Guy Lev:
Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks. 548-552 - Xiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow:
Estimation of Gap Between Current Language Models and Human Performance. 553-557 - Anna Moró, György Szaszák:
A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation Recovery. 558-562
Speech Perception
- Lei Wang, Fei Chen:
Factors Affecting the Intelligibility of Low-Pass Filtered Speech. 563-566 - Shiyu Wang, Fei Chen:
Phonetic Restoration of Temporally Reversed Speech. 567-570