default search action
Samuel Thomas 0001
Person information
- affiliation: IBM Research AI, Thomas J. Watson Research Center, NY, USA
- affiliation (former): Johns Hopkins University, USA
Other persons with the same name
- Samuel Thomas — disambiguation page
- Samuel Thomas 0002 — Institut de Radioprotection et de Sûreté Nucléaire, Fontenay-aux-Roses, France
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c99]Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogério Feris, James R. Glass, Hilde Kuehne:
What, When, and Where? Self-Supervised Spatio- Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions. CVPR 2024: 18419-18429 - [i29]Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogério Feris, James R. Glass:
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation. CoRR abs/2406.10082 (2024) - 2023
- [c98]Takashi Fukuda, Samuel Thomas:
Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data. ICASSP 2023: 1-5 - [c97]Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass:
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. ICASSP 2023: 1-5 - [c96]Vishal Sunder, Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury, Eric Fosler-Lussier:
Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding. ICASSP 2023: 1-5 - [c95]Samuel Thomas, Hong-Kwang Jeff Kuo, George Saon, Brian Kingsbury:
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition. ICASSP 2023: 1-5 - [c94]Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury:
ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding. INTERSPEECH 2023: 1129-1133 - [c93]Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass:
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages. INTERSPEECH 2023: 2268-2272 - [i28]Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogério Feris, James R. Glass, Hilde Kuehne:
What, when, and where? - Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions. CoRR abs/2303.16990 (2023) - [i27]Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass:
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages. CoRR abs/2305.12606 (2023) - 2022
- [c92]Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne:
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. CVPR 2022: 19988-19997 - [c91]Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Brian Kingsbury, George Saon:
Improving End-to-end Models for Set Prediction in Spoken Language Understanding. ICASSP 2022: 7162-7166 - [c90]Vishal Sunder, Samuel Thomas, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier:
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding. ICASSP 2022: 7497-7501 - [c89]Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury:
A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets. ICASSP 2022: 7632-7636 - [c88]Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon:
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems. ICASSP 2022: 7932-7936 - [c87]Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo:
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models. ICASSP 2022: 8127-8131 - [c86]Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon:
Extending RNN-T-based speech recognition systems with emotion and language classification. INTERSPEECH 2022: 546-549 - [c85]Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang Kuo, Brian Kingsbury:
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems. INTERSPEECH 2022: 2683-2687 - [c84]Takashi Fukuda, Samuel Thomas, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury:
Global RNN Transducer Models For Multi-dialect Speech Recognition. INTERSPEECH 2022: 3138-3142 - [i26]Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Brian Kingsbury, George Saon:
Improving End-to-End Models for Set Prediction in Spoken Language Understanding. CoRR abs/2201.12105 (2022) - [i25]Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury:
A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets. CoRR abs/2202.10137 (2022) - [i24]Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo:
Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models. CoRR abs/2202.13155 (2022) - [i23]Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon:
Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems. CoRR abs/2203.00006 (2022) - [i22]Vishal Sunder, Samuel Thomas, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier:
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding. CoRR abs/2204.05169 (2022) - [i21]Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury:
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems. CoRR abs/2204.05188 (2022) - [i20]Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon:
Extending RNN-T-based speech recognition systems with emotion and language classification. CoRR abs/2207.13965 (2022) - [i19]Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass:
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. CoRR abs/2210.03625 (2022) - 2021
- [j3]Leda Sari, Mark Hasegawa-Johnson, Samuel Thomas:
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection. IEEE ACM Trans. Audio Speech Lang. Process. 29: 324-333 (2021) - [c83]Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson da Silva Morais:
Resource-efficient TDNN Architectures for Audio-visual Speech Recognition. EUSIPCO 2021: 506-510 - [c82]Edmilson da Silva Morais, Hong-Kwang Jeff Kuo, Samuel Thomas, Zoltán Tüske, Brian Kingsbury:
End-to-End Spoken Language Understanding Using Transformer Networks and Self-Supervised Pre-Trained Features. ICASSP 2021: 7483-7487 - [c81]Samuel Thomas, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory:
RNN Transducer Models for Spoken Language Understanding. ICASSP 2021: 7493-7497 - [c80]Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Michael Picheny, Shih-Fu Chang:
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. ICCV 2021: 7992-8001 - [c79]Jatin Ganhotra, Samuel Thomas, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury:
Integrating Dialog History into End-to-End Spoken Language Understanding Systems. Interspeech 2021: 1254-1258 - [c78]Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James R. Glass:
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. Interspeech 2021: 1584-1588 - [c77]Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass:
Cascaded Multilingual Audio-Visual Learning from Videos. Interspeech 2021: 3006-3010 - [c76]Takashi Fukuda, Samuel Thomas:
Knowledge Distillation Based Training of Universal ASR Source Models for Cross-Lingual Transfer. Interspeech 2021: 3450-3454 - [c75]Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Jeff Kuo, Samuel Thomas, Edmilson da Silva Morais:
Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs. Interspeech 2021: 4723-4727 - [i18]Samuel Thomas, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory:
RNN Transducer Models For Spoken Language Understanding. CoRR abs/2104.03842 (2021) - [i17]Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Kuo, Samuel Thomas, Edmilson da Silva Morais:
Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs. CoRR abs/2104.05752 (2021) - [i16]Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Schmidt Feris, David Harwath, James R. Glass, Michael Picheny, Shih-Fu Chang:
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. CoRR abs/2104.12671 (2021) - [i15]Jatin Ganhotra, Samuel Thomas, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury:
Integrating Dialog History into End-to-End Spoken Language Understanding Systems. CoRR abs/2108.08405 (2021) - [i14]Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass:
Cascaded Multilingual Audio-Visual Learning from Videos. CoRR abs/2111.04823 (2021) - [i13]Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander H. Liu, David Harwath, James R. Glass, Hilde Kuehne, Mubarak Shah:
Routing with Self-Attention for Multimodal Capsule Networks. CoRR abs/2112.00775 (2021) - [i12]Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne:
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. CoRR abs/2112.04446 (2021) - 2020
- [c74]Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson da Silva Morais:
Audio-Assisted Image Inpainting for Talking Faces. ICASSP 2020: 7664-7668 - [c73]Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny:
Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems. ICASSP 2020: 7984-7988 - [c72]Leda Sari, Samuel Thomas, Mark Hasegawa-Johnson:
Training Spoken Language Understanding Systems with Non-Parallel Speech and Text. ICASSP 2020: 8109-8113 - [c71]Takashi Fukuda, Samuel Thomas:
Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework. INTERSPEECH 2020: 41-45 - [c70]Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras:
End-to-End Spoken Language Understanding Without Full Transcripts. INTERSPEECH 2020: 906-910 - [c69]Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson da Silva Morais:
Resource-Adaptive Deep Learning for Visual Speech Recognition. INTERSPEECH 2020: 3510-3514 - [c68]Samuel Thomas, Kartik Audhkhasi, Brian Kingsbury:
Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings. INTERSPEECH 2020: 4736-4740 - [i11]Andrew Rouditchenko, Angie W. Boggust, David Harwath, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Rogério Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James R. Glass:
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. CoRR abs/2006.09199 (2020) - [i10]Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras:
End-to-End Spoken Language Understanding Without Full Transcripts. CoRR abs/2009.14386 (2020) - [i9]Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny:
Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems. CoRR abs/2010.04284 (2020) - [i8]Edmilson da Silva Morais, Hong-Kwang Jeff Kuo, Samuel Thomas, Zoltán Tüske, Brian Kingsbury:
End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features. CoRR abs/2011.08238 (2020)
2010 – 2019
- 2019
- [c67]Takashi Fukuda, Samuel Thomas:
Mixed Bandwidth Acoustic Modeling Leveraging Knowledge Distillation. ASRU 2019: 509-515 - [c66]George Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, Michael Picheny, Samuel Thomas:
Simplified LSTMS for Speech Recognition. ASRU 2019: 547-553 - [c65]Yinghui Huang, Samuel Thomas, Masayuki Suzuki, Zoltán Tüske, Larry Sansone, Michael Picheny:
Semi-Supervised Training and Data Augmentation for Adaptation of Automatic Broadcast News Captioning Systems. ASRU 2019: 867-874 - [c64]Angie W. Boggust, Kartik Audhkhasi, Dhiraj Joshi, David Harwath, Samuel Thomas, Rogério Schmidt Feris, Danny Gutfreund, Yang Zhang, Antonio Torralba, Michael Picheny, James R. Glass:
Grounding Spoken Words in Unlabeled Video. CVPR Workshops 2019: 29-32 - [c63]Leda Sari, Samuel Thomas, Mark Hasegawa-Johnson, Michael Picheny:
Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News. ICASSP 2019: 6286-6290 - [c62]Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltán Tüske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko:
English Broadcast News Speech Recognition by Humans and Machines. ICASSP 2019: 6455-6459 - [c61]Masayuki Suzuki, Nobuyasu Itoh, Tohru Nagano, Gakuto Kurata, Samuel Thomas:
Improvements to N-gram Language Model Using Text Generated from Neural Language Model. ICASSP 2019: 7245-7249 - [c60]Leda Sari, Samuel Thomas, Mark A. Hasegawa-Johnson:
Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks. INTERSPEECH 2019: 769-773 - [c59]Samuel Thomas, Kartik Audhkhasi, Zoltán Tüske, Yinghui Huang, Michael Picheny:
Detection and Recovery of OOVs for Improved English Broadcast News Captioning. INTERSPEECH 2019: 2973-2977 - [i7]Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltán Tüske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko:
English Broadcast News Speech Recognition by Humans and Machines. CoRR abs/1904.13258 (2019) - 2018
- [c58]Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson:
Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition. ICASSP 2018: 5989-5993 - [c57]Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin, Gakuto Kurata:
Data Augmentation Improves Recognition of Foreign Accented Speech. INTERSPEECH 2018: 2409-2413 - [c56]Masayuki Suzuki, Tohru Nagano, Gakuto Kurata, Samuel Thomas:
Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models. INTERSPEECH 2018: 2893-2897 - [c55]Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, Noam Slonim:
A Recorded Debating Dataset. LREC 2018 - [i6]Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson:
Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition. CoRR abs/1802.02656 (2018) - [i5]Minh N. B. Nguyen, Samuel Thomas, Anne E. Gattiker, Sujatha Kashyap, Kush R. Varshney:
SimplerVoice: A Key Message & Visual Description Generator System for Illiteracy. CoRR abs/1811.01299 (2018) - [i4]Vidya Muthukumar, Tejaswini Pedapati, Nalini K. Ratha, Prasanna Sattigeri, Chai-Wah Wu, Brian Kingsbury, Abhishek Kumar, Samuel Thomas, Aleksandra Mojsilovic, Kush R. Varshney:
Understanding Unequal Gender Classification Accuracy from Face Images. CoRR abs/1812.00099 (2018) - 2017
- [c54]Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran:
Effective joint training of denoising feature space transforms and Neural Network based acoustic models. ICASSP 2017: 5190-5194 - [c53]George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall:
English Conversational Telephone Speech Recognition by Humans and Machines. INTERSPEECH 2017: 132-136 - [c52]Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran:
Efficient Knowledge Distillation from an Ensemble of Teachers. INTERSPEECH 2017: 3697-3701 - [i3]George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall:
English Conversational Telephone Speech Recognition by Humans and Machines. CoRR abs/1703.02136 (2017) - [i2]Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, Noam Slonim:
A Recorded Debating Dataset. CoRR abs/1709.06438 (2017) - 2016
- [c51]David Haws, Dimitrios Dimitriadis, George Saon, Samuel Thomas, Michael Picheny:
On the importance of event detection for ASR. ICASSP 2016: 5705-5709 - [c50]Colin Vaz, Dimitrios Dimitriadis, Samuel Thomas, Shrikanth S. Narayanan:
CNMF-based acoustic features for noise-robust ASR. ICASSP 2016: 5735-5739 - [c49]Masayuki Suzuki, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran, George Saon:
Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings. INTERSPEECH 2016: 1588-1592 - [c48]Dimitrios Dimitriadis, Samuel Thomas, Sriram Ganapathy:
An Investigation on the Use of i-Vectors for Robust ASR. INTERSPEECH 2016: 3828-3832 - [c47]Samuel Thomas, Kartik Audhkhasi, Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran:
Multilingual Data Selection for Low Resource Speech Recognition. INTERSPEECH 2016: 3853-3857 - [i1]Dmitriy Serdyuk, Kartik Audhkhasi, Philemon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio:
Invariant Representations for Noisy Speech Recognition. CoRR abs/1612.01928 (2016) - 2015
- [c46]Samuel Thomas, George Saon, Maarten Van Segbroeck, Shrikanth S. Narayanan:
Improvements to the IBM speech activity detection system for the DARPA RATS program. ICASSP 2015: 4500-4504 - [c45]Sriram Ganapathy, Samuel Thomas, Dimitrios Dimitriadis, Steven J. Rennie:
Investigating factor analysis features for deep neural networks in noisy speech recognition. INTERSPEECH 2015: 1898-1902 - [c44]Samuel Thomas, George Saon, Hong-Kwang Jeff Kuo, Lidia Mangu:
The IBM BOLT speech transcription system. INTERSPEECH 2015: 3150-3153 - 2014
- [c43]Samuel Thomas, Sriram Ganapathy, George Saon, Hagen Soltau:
Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions. ICASSP 2014: 2519-2523 - [c42]Sriram Ganapathy, Kyu Jeong Han, Samuel Thomas, Mohamed Kamal Omar, Maarten Van Segbroeck, Shrikanth S. Narayanan:
Robust language identification using convolutional neural network features. INTERSPEECH 2014: 1846-1850 - [c41]Steven J. Rennie, Vaibhava Goel, Samuel Thomas:
Deep Order Statistic Networks. SLT 2014: 124-128 - [c40]Steven J. Rennie, Vaibhava Goel, Samuel Thomas:
Annealed dropout training of deep networks. SLT 2014: 159-164 - 2013
- [c39]Samuel Thomas, Michael L. Seltzer, Kenneth Church, Hynek Hermansky:
Deep neural network features and semi-supervised training for low resource speech recognition. ICASSP 2013: 6704-6708 - [c38]Oldrich Plchot, Spyros Matsoukas, Pavel Matejka, Najim Dehak, Jeff Z. Ma, Sandro Cumani, Ondrej Glembek, Hynek Hermansky, Sri Harish Reddy Mallidi, Nima Mesgarani, Richard M. Schwartz, Mehdi Soufifar, Zheng-Hua Tan, Samuel Thomas, Bing Zhang, Xinhui Zhou:
Developing a speaker identification system for the DARPA RATS project. ICASSP 2013: 6768-6772 - [c37]Aren Jansen, Samuel Thomas, Hynek Hermansky:
Weak top-down constraints for unsupervised acoustic model training. ICASSP 2013: 8091-8095 - [c36]Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur, Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard C. Rose, Mike Seltzer, Pascal Clark, Ian McGraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin Börschinger, Justin T. Chiu, Ewan Dunbar, Abdellah Fourtassi, David Harwath, Chia-ying Lee, Keith D. Levin, Atta Norouzian, Vijayaditya Peddinti, Rachael Richardson, Thomas Schatz, Samuel Thomas:
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition. ICASSP 2013: 8111-8115 - [c35]George Saon, Samuel Thomas, Hagen Soltau, Sriram Ganapathy, Brian Kingsbury:
The IBM speech activity detection system for the DARPA RATS program. INTERSPEECH 2013: 3497-3501 - 2012
- [c34]Daniel Garcia-Romero, Xinhui Zhou, Dmitry N. Zotkin, Balaji Vasan Srinivasan, Yuancheng Luo, Sriram Ganapathy, Samuel Thomas, Sridhar Krishna Nemala, Garimella S. V. S. Sivaram, Majid Mirbagheri, Sri Harish Reddy Mallidi, Thomas Janu, Padmanabhan Rajan, Nima Mesgarani, Mounya Elhilali, Hynek Hermansky, Shihab A. Shamma, Ramani Duraiswami:
The UMD-JHU 2011 speaker recognition system. ICASSP 2012: 4229-4232 - [c33]Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Multilingual MLP features for low-resource LVCSR systems. ICASSP 2012: 4269-4272 - [c32]Samuel Thomas, Sriram Ganapathy, Aren Jansen, Hynek Hermansky:
Data-driven Posterior Features for Low Resource Speech Recognition Applications. INTERSPEECH 2012: 791-794 - [c31]Aren Jansen, Samuel Thomas, Hynek Hermansky:
Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition. INTERSPEECH 2012: 879-882 - [c30]Samuel Thomas, Sri Harish Reddy Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab A. Shamma, Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas:
Acoustic and Data-driven Features for Robust Speech Activity Detection. INTERSPEECH 2012: 1985-1988 - [c29]Atta Norouzian, Aren Jansen, Richard C. Rose, Samuel Thomas:
Exploiting Discriminative Point Process Models for Spoken Term Detection. INTERSPEECH 2012: 2442-2445 - [c28]Samuel Thomas, Sri Harish Reddy Mallidi, Sriram Ganapathy, Hynek Hermansky:
Adaptation transforms of auto-associative neural networks as features for speaker verification. Odyssey 2012: 98-104 - [c27]Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Feature extraction using 2-d autoregressive models for speaker recognition. Odyssey 2012: 229-235 - 2011
- [j2]Daniel Povey, Lukás Burget, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra K. Goel, Martin Karafiát, Ariya Rastrow, Richard C. Rose, Petr Schwarz, Samuel Thomas:
The subspace Gaussian mixture model - A structured model for speech recognition. Comput. Speech Lang. 25(2): 404-439 (2011) - [c26]Samuel Thomas, Patrick Nguyen, Geoffrey Zweig, Hynek Hermansky:
MLP based phoneme detectors for Automatic Speech Recognition. ICASSP 2011: 5024-5027 - [c25]Geoffrey Zweig, Patrick Nguyen, Dirk Van Compernolle, Kris Demuynck, Les E. Atlas, Pascal Clark, Gregory Sell, Meihong Wang, Fei Sha, Hynek Hermansky, Damianos G. Karakos, Aren Jansen, Samuel Thomas, Sivaram G. S. V. S., Samuel R. Bowman, Justine T. Kao:
Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop. ICASSP 2011: 5044-5047 - [c24]Michael A. Carlin, Samuel Thomas, Aren Jansen, Hynek Hermansky:
Rapid Evaluation of Speech Representations for Spoken Term Discovery. INTERSPEECH 2011: 821-824 - [c23]Nima Mesgarani, Samuel Thomas, Hynek Hermansky:
Adaptive Stream Fusion in Multistream Recognition of Speech. INTERSPEECH 2011: 2329-2332 - [c22]Garimella S. V. S. Sivaram, Samuel Thomas, Hynek Hermansky:
Mixture of Auto-Associative Neural Networks for Speaker Verification. INTERSPEECH 2011: 2381-2384 - [c21]Hynek Hermansky, Nima Mesgarani, Samuel Thomas:
Performance monitoring for robustness in automatic recognition of speechi. MLSLP 2011: 31-34 - 2010
- [c20]Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Robust spectro-temporal features based on autoregressive models of Hilbert envelopes. ICASSP 2010: 4286-4289 - [c19]Arnab Ghoshal, Daniel Povey, Mohit Agarwal, Pinar Akyazi, Lukás Burget, Kai Feng, Ondrej Glembek, Nagendra Goel, Martin Karafiát, Ariya Rastrow, Richard C. Rose, Petr Schwarz, Samuel Thomas:
A novel estimation of feature-space MLLR for full-covariance models. ICASSP 2010: 4310-4313 - [c18]Daniel Povey, Lukás Burget, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra K. Goel, Martin Karafiát, Ariya Rastrow, Richard C. Rose, Petr Schwarz, Samuel Thomas:
Subspace Gaussian Mixture Models for speech recognition. ICASSP 2010: 4330-4333 - [c17]Lukás Burget, Petr Schwarz, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra K. Goel, Martin Karafiát, Daniel Povey, Ariya Rastrow, Richard C. Rose, Samuel Thomas:
Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models. ICASSP 2010: 4334-4337 - [c16]Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Comparison of modulation features for phoneme recognition. ICASSP 2010: 5038-5041 - [c15]Nagendra Goel, Samuel Thomas, Mohit Agarwal, Pinar Akyazi, Lukás Burget, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Martin Karafiát, Daniel Povey, Ariya Rastrow, Richard C. Rose, Petr Schwarz:
Approaches to automatic lexicon learning with limited training examples. ICASSP 2010: 5094-5097 - [c14]Nima Mesgarani, Samuel Thomas, Hynek Hermansky:
A multistream multiresolution framework for phoneme recognition. INTERSPEECH 2010: 318-321 - [c13]Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Cross-lingual and multi-stream posterior features for low resource LVCSR systems. INTERSPEECH 2010: 877-880 - [c12]Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky:
A phoneme recognition framework based on auditory spectro-temporal receptive fields. INTERSPEECH 2010: 2458-2461
2000 – 2009
- 2009
- [c11]Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Temporal envelope subtraction for robust speech recognition using modulation spectrum. ASRU 2009: 164-169 - [c10]Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Phoneme recognition using spectral envelope and modulation frequency features. ICASSP 2009: 4453-4456 - [c9]Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Static and dynamic modulation spectrum for speech recognition. INTERSPEECH 2009: 2823-2826 - [c8]Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Tandem representations of spectral envelope and modulation frequency features for ASR. INTERSPEECH 2009: 2955-2958 - [c7]Sriram Ganapathy, Samuel Thomas, Petr Motlícek, Hynek Hermansky:
Applications of signal analysis using autoregressive models for amplitude modulation. WASPAA 2009: 341-344 - 2008
- [j1]Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Recognition of Reverberant Speech Using Frequency Domain Linear Prediction. IEEE Signal Process. Lett. 15: 681-684 (2008) - [c6]Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Spectro-temporal features for Automatic Speech Recognition using Linear Prediction in spectral domain. EUSIPCO 2008: 1-4 - [c5]Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Front-end for far-field speech recognition based on frequency domain linear prediction. INTERSPEECH 2008: 984-987 - [c4]Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Hilbert envelope based spectro-temporal features for phoneme recognition in telephone speech. INTERSPEECH 2008: 1521-1524 - [c3]Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Hilbert Envelope Based Features for Far-Field Speech Recognition. MLMI 2008: 119-124 - 2007
- [c2]Samuel Thomas, Ashish Verma:
Language identification of person names using CF-IOF based weighing function. INTERSPEECH 2007: 1769-1772 - 2006
- [c1]Samuel Thomas, M. Nageshwara Rao, Hema A. Murthy, Coimbatore S. Ramalingam:
Natural sounding TTS based on syllable-like units. EUSIPCO 2006: 1-5
Coauthor Index
aka: Rogério Schmidt Feris
aka: David Harwath
aka: Hong-Kwang Kuo
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-07 21:21 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint