


default search action
ICASSP 2003: Hong Kong
- 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '03, Hong Kong, April 6-10, 2003. IEEE 2003, ISBN 0-7803-7663-3
Volume 1
Keynotes
- Nikil Jayant:
Pervasive broadband: opportunities for signal processing. 1 - Ya-Qin Zhang:
Advances in networked media - theory and practice. 2 - Georgios B. Giannakis:
Ultra-wideband communications: an idea whose time has come. 3
Acoustic Modeling for Robust ASR
- Bryan L. Pellom, Kadri Hacioglu:
Recent improvements in the CU Sonic ASR system for noisy speech: the SPINE task. 4-7 - Wei-Tyng Hong:
A discriminative and robust training algorithm for noisy speech recognition. 8-11 - Xiaodong Cui, Yifan Gong:
Variable parameter Gaussian mixture hidden Markov modeling for speech recognition. 12-15 - Takehito Utsuro, Yasuhiro Kodama, Tomohiro Watanabe, Hiromitsu Nishizaki, Seiichi Nakagawa:
Confidence of agreement among multiple LVCSR models and model combination by SVM. 16-19 - Todd A. Stephenson, Mathew Magimai-Doss, Hervé Bourlard:
Speech recognition of spontaneous, noisy speech using auxiliary information in Bayesian networks. 20-23 - Ashutosh Garg, Gerasimos Potamianos, Chalapathy Neti, Thomas S. Huang:
Frame-dependent multi-stream reliability indicators for audio-visual speech recognition. 24-27
Language ID
- Sonia Parandekar, Katrin Kirchhoff:
Multi-stream language identification using data-driven dependency selection. 28-31 - A. K. V. Sai Jayram, V. Ramasubramanian, Thippur V. Sreenivas:
Language identification using parallel sub-word recognition. 32-35 - Qian-Rong Gu, Tadashi Shibata:
Speaker and text independent language identification using predictive error histogram vectors. 36-39 - Jean-Luc Rouas, Jérôme Farinas, François Pellegrino
, Régine André-Obrecht:
Modeling prosody for language identification on read and spontaneous speech. 40-43 - Eddie Wong, Sridha Sridharan:
Three approaches to multilingual phone recognition. 44-47 - Jilei Tian, Janne Suontausta:
Scalable neural network based language identification from written text. 48-51
Novel Feature Extraction and Processing
- Panu Somervuo:
Experiments with linear and nonlinear feature transformations in HMM based phone recognition. 52-55 - Sunil Sivadas, Hynek Hermansky:
Generalized tandem feature extraction. 56-59 - Andrew C. Lindgren, Michael T. Johnson, Richard J. Povinelli:
Speech recognition using reconstructed phase space features. 60-63 - Bojana Gajic, Kuldip K. Paliwal:
Robust speech recognition using features based on zero crossings with peak amplitudes. 64-67 - Hema A. Murthy, Venkata Gadde:
The modified group delay function and its application to phoneme recognition. 68-71 - Jinfu Ni, Hisashi Kawai:
Tone feature extraction through parametric modeling and analysis-by-synthesis-based pattern matching. 72-75
Speech Enhancement I
- Jong Uk Kim, Sang-Gyun Kim, Chang D. Yoo:
The incorporation of masking threshold to subspace speech enhancement. 76-79 - Lee Lin, W. Harvey Holmes, Eliathamby Ambikairajah:
Subband noise estimation for speech enhancement using a perceptual Wiener filter. 80-83 - Justinian Rosca, Radu V. Balan, Christophe Beaugeant:
Multi-channel psychoacoustically motivated speech enhancement. 84-87 - Steven J. Rennie, Parham Aarabi, Trausti T. Kristjansson, Brendan J. Frey, Kannan Achan:
Robust variational speech separation using fewer microphones than speakers. 88-91 - Tomohiro Nakatani, Masato Miyoshi:
Blind dereverberation of single channel speech signal based on harmonic structure. 92-95 - Marcin Kuropatwinski
, W. Bastiaan Kleijn
:
Minimum mean square error estimation of speech short-term predictor parameters under noisy conditions. 96-99
Packet Loss and Channel Coding
- Jonas Lindblom, Per Hedelin:
Error protection and packet loss concealment based on a signal matched sinusoidal vocoder. 100-103 - Christoffer Asgaard Rødbro, Mads Græsbøll Christensen
, Søren Vang Andersen, Søren Holdt Jensen:
Compressed domain packet loss concealment of sinusoidally coded speech. 104-107 - Philippe Gournay, François Rousseau, Roch Lefebvre:
Improved packet loss recovery using late frames for prediction-based speech coders. 108-111 - Costas S. Xydeas, Fotis Zafeiropoulos:
Model-based packet loss concealment for AMR coders. 112-115 - Moon-Keun Lee, Sung-Kyo Jung, Hong-Goo Kang, Young-Cheol Park, Dae Hee Youn:
A packet loss concealment algorithm based on time-scale modification for CELP-type speech coders. 116-119 - Anand D. Subramaniam, William R. Gardner, Bhaskar D. Rao:
Joint source-channel decoding of speech spectrum parameters over erasure channels using Gaussian mixture models. 120-123
Acoustic Modeling: Survey of New Techniques
- Yasuhiro Minami, Erik McDermott, Atsushi Nakamura, Shigeru Katagiri:
Recognition method with parametric trajectory generated from mixture distribution HMMs. 124-127 - John W. McDonough, Alex Waibel:
Maximum mutual information speaker adapted training with semi-tied covariance matrices. 128-131 - Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Automatic complexity control for HLDA systems. 132-135 - Vlasios Doumpiotis, Stavros Tsakalidis, William Byrne:
Discriminative training for segmental minimum Bayes risk decoding. 136-139 - Tetsuji Ogawa, Tetsunori Kobayashi:
Hybrid modeling of PHMM and HMM for speech recognition. 140-143 - Sebastian Stüker, Tanja Schultz, Florian Metze, Alex Waibel:
Multilingual articulatory features. 144-147
Speech Modeling and Analysis
- Ashraf Alkhairy:
Mathematical models of vocal tract with distributed sources. 148-151 - Paavo Alku, Tom Bäckström:
All-pole modeling of wide-band speech with symmetric linear prediction. 152-155 - Karl Schnell, Arild Lacroix:
Generation of nasalized speech sounds based on branched tube models obtained from separate mouth and nose outputs. 156-159 - Mark Thomson, Simon Boland, Mike Wu, Julien Epps, Michael Smithers:
Decomposition of speech into voiced and unvoiced components based on a state-space signal model. 160-163 - Ramon Prieto, Sora Kim:
Time delay estimation and adaptive frame length iterations for noise robust pitch extraction. 164-167 - Yu Shi, Eric Chang:
Spectrogram-based formant tracking via particle filters. 168-171
New Methods for Speaker Recognition, Segmentation, and Implementation
- Masafumi Nishida, Tatsuya Kawahara:
Unsupervised speaker indexing using speaker model selection based on Bayesian information criterion. 172-175 - Guillaume Lathoud, Iain A. McCowan:
Location based speaker segmentation. 176-179 - Yassine Mami, Delphine Charlet:
Speaker identification by anchor models with PCA/LDA post-processing. 180-183 - Phu Chien Nguyen, Masato Akagi, Tu Bao Ho:
Temporal decomposition: a promising approach to VQ-based speaker identification. 184-187 - LiFeng Sang, Zhaohui Wu, Yingchun Yang, Wanfeng Zhang:
Automatic speaker recognition using dynamic Bayesian network. 188-191 - Chengyuan Ma, Eric Chang:
Comparison of discriminative training methods for speaker verification. 192-195
Large Vocabulary Speech Recognition
- Gustavo Hernández Ábrego, Xavier Menéndez-Pidal, Thomas Kemp, Katsuki Minamino, Helmut Lucke:
Automatic set-up for speech recognition engines based on merit optimization. 196-199 - Miroslav Novak, Radek Hampl, Pavel Krbec, Vladimír Bergl, Jan Sedivý:
Two-pass search strategy for large list recognition on embedded speech recognition platforms. 200-203 - Sabine Deligne, Lidia Mangu:
On the use of lattices for the automatic generation of pronunciations. 204-207 - Dimitra Vergyri, Andreas Stolcke, Venkata Ramana Rao Gadde, Luciana Ferrer, Elizabeth Shriberg:
Prosodic knowledge sources for automatic speech recognition. 208-211 - Jean-Luc Gauvain, Lori Lamel, Holger Schwenk, Gilles Adda, Langzhou Chen, Fabrice Lefèvre:
Conversational telephone speech recognition. 212-215 - Bhuvana Ramabhadran, Jing Huang, Michael Picheny:
Towards automatic transcription of large spoken archives - English ASR for the MALACH project. 216-219
Unsupervised Language Model Adaption
- Langzhou Chen, Jean-Luc Gauvain, Lori Lamel, Gilles Adda:
Unsupervised language model adaptation for broadcast news. 220-223 - Michiel Bacchiani, Brian Roark:
Unsupervised language model adaptation. 224-227 - Takaaki Hori, Daniel Willett, Yasuhiro Minami:
Language model adaptation using WFST-based speaking-style translation. 228-231 - Erwin Leeuwis, Marcello Federico, Mauro Cettolo:
Language modeling and transcription of the TED corpus lectures. 232-235 - Tadasuke Yokoyama, Takahiro Shinozaki, Koji Iwano, Sadaoki Furui:
Unsupervised class-based language model adaptation for spontaneous speech recognition. 236-239 - Wen Wang, Mary P. Harper, Andreas Stolcke:
The robustness of an almost-parsing language model given errorful training data. 240-243
Speech Synthesis Overview
- Jerome R. Bellegarda:
Unsupervised, language-independent grapheme-to-phoneme conversion by latent analogy. 244-247 - Matthias Eichner, Steffen Werner, Matthias Wolff, Rüdiger Hoffmann:
Towards spontaneous speech synthesis - LM based selection of pronunciation variants. 248-251 - Ki-Seung Lee, Jeongsu Kim:
Context-adaptive phone boundary refining for a TTS database. 252-255 - Hideki Kawahara, Hisami Matsui:
Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation. 256-259 - Matthew Lee, Mark J. T. Smith:
Spectral modification for digital singing voice synthesis using asymmetric generalized Gaussians. 260-263 - Min Chu, Hu Peng, Yong Zhao, Zhengyu Niu, Eric Chang:
Microsoft Mulan - a bilingual TTS system. 264-267
Spoken Language Understanding
- Yulan He
, Steve J. Young:
Hidden vector state model for hierarchical semantic parsing. 268-271 - Anand Venkataraman, Luciana Ferrer, Andreas Stolcke, Elizabeth Shriberg:
Training a prosody-based dialog act tagger from unlabeled data. 272-275 - Gökhan Tür, Robert E. Schapire, Dilek Hakkani-Tür:
Active learning for spoken language understanding. 276-279 - Ciprian Chelba, Milind Mahajan, Alex Acero:
Speech utterance classification. 280-283 - Ye-Yi Wang, Alex Acero:
Concept acquisition in example-based grammar authoring. 284-287 - Juan M. Huerta, David M. Lubensky:
Graph-based representation and techniques for NLU application development. 288-291
Speaker Adaption
- Daniel Willett, Thomas Niesler, Erik McDermott, Yasuhiro Minami, Shigeru Katagiri:
Pervasive unsupervised adaptation for lecture speech transcription. 292-295 - Kyung-Tak Lee, Lynette Melnar, Jim Talley, Christian Wellekens:
Symbolic speaker adaptation with phone inventory expansion. 296-299 - Guo-Hong Ding, Bo Xu, Juha Iso-Sipilä, Yang Cao:
Fast speaker adaptation using triple diagonal and shared block diagonal transform matrices. 300-303 - Dong Kook Kim, Young Joon Kim, Woohyung Lim, Nam Soo Kim:
Online adaptation using speatransformation space model evolution. 304-307 - Bowen Zhou, John H. L. Hansen:
Discriminative acoustic model using eigenspace mapping for rapid speaker adaptation. 308-311 - Daniel Povey, Philip C. Woodland, Mark J. F. Gales:
Discriminative map for acoustic model adaptation. 312-315
Robust ASR in Mobile and Distributed Environments
- Richard C. Rose, Iker Arizmendi, Sarangarajan Parthasarathy:
An efficient framework for robust mobile speech recognition services. 316-319 - Luca Cristoforetti, Marco Matassoni, Maurizio Omologo, Piergiorgio Svaizer:
Use of parallel recognizers for robust in-car speech interaction. 320-323 - Hideki Banno, Tetsuya Shinde, Kazuya Takeda, Fumitada Itakura:
In-car speech recognition using distributed microphones-adapting to automatically detected driving conditions. 324-327 - Kadri Hacioglu, Bryan L. Pellom:
A distributed architecture for robust automatic speech recognition. 328-331 - Jan Stadermann, Gerhard Rigoll:
Flexible feature extraction and HMM design for a hybrid distributed speech recognition system in noisy environments. 332-335 - Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg:
OOV-detection and channel error protection for distributed speech recognition over wireless networks. 336-339
Language Modelling and Large Vocabulary Recognition
- Shoichi Matsunaga, Atsunori Ogawa, Yoshikazu Yamaguchi, Akihiro Imamura:
Non-native English speech recognition using bilingual English lexicon and acoustic models. 340-343 - Katrin Kirchhoff, Jeff A. Bilmes, Sourin Das, Nicolae Duta, Melissa Egan, Gang Ji, Feng He, John Henderson, Daben Liu, Mohammed Noamany, Patrick Schone, Richard M. Schwartz, Dimitra Vergyri:
Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop. 344-347 - Renato De Mori, Frédéric Béchet, Gérard Subsol, Dominique Massonié:
Dynamic scheduling of decoding processes for directory assistance. 348-351 - Cyril Allauzen, Mehryar Mohri:
Generalized optimization algorithm for speech recognition transducers. 352-355 - Diamantino Caseiro, Isabel Trancoso:
A tail-sharing WFST composition algorithm for large vocabulary speech recognition. 356-359 - Fabio Brugnara:
Context-dependent search in a context-independent network. 360-363 - Adam Janin, Don Baron, Jane Edwards, Dan Ellis, David Gelbart, Nelson Morgan, Barbara Peskin, Thilo Pfau, Elizabeth Shriberg, Andreas Stolcke, Chuck Wooters
:
The ICSI Meeting Corpus. 364-367 - Máté Szarvas, Sadaoki Furui:
Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR. 368-371 - Ahmad Emami, Peng Xu, Frederick Jelinek:
Using a connectionist model in a syntactical based language model. 372-375 - Shaojun Wang, Dale Schuurmans, Fuchun Peng, Yunxin Zhao:
Semantic n-gram language modeling with the latent maximum entropy principle. 376-379 - Hong-Kwang Jeff Kuo, Chin-Hui Lee, Imed Zitouni, Eric Fosler-Lussier:
Minimum verification error training for topic verification. 380-383 - Tomonori Kikuchi, Sadaoki Furui, Chiori Hori:
Automatic speech summarization based on sentence extraction and compaction. 384-387 - Bhiksha Raj, Edward W. D. Whittaker:
Lossless compression of language model structure and word identifiers. 388-391
Feature Processing for Robust ASR
- Shingo Kuroiwa, Satoru Tsuge:
Blind equalization techniques for ETSI standard DSR front-end. 392-395 - Rita Singh, Bhiksha Raj:
Tracking noise via dynamical systems with a continuum of states. 396-399 - Ni-Chun Wang, Jeih-Weih Hung, Lin-Shan Lee:
Data-driven temporal filters based on multi-eigenvectors for robust features in speech recognition. 400-403 - Kam-keung Chu, Shu-hung Leung, Chun-Shing Yip:
Perceptually non-uniform spectral compression for noisy speech recognition. 404-407 - Michael L. Seltzer, Richard M. Stern:
Subband parameter optimization of microphone arrays for speech recognition in reverberant environments. 408-411 - Chuan Jia, Peng Ding, Bo Xu:
Sequential MAP estimation based speech feature enhancement for noise robust speech recognition. 412-415 - Peter Jancovic, Münevver Köküer, Fionn Murtagh:
Reliability-based estimation of the number of noisy features: application to model-order selection in the union models. 416-419 - Ji Ming, Francis Jack Smith:
A posterior union model for improved robust speech recognition in nonstationary noise. 420-423 - Françoise Beaufays, Daniel Boies, Mitch Weintraub, Qifeng Zhu:
Using speech/non-speech detection to bias recognition search on noisy data. 424-427 - Lingyun Gu, Jianbo Gao, A. G. Harris:
Endpoint detection in noisy environment using a Poincare recurrence metric. 428-431 - Izhak Shafran, Richard Rose:
Robust speech detection and segmentation for real-time ASR applications. 432-435 - Oh-Wook Kwon, Te-Won Lee:
Optimizing speech/non-speech classifier design using AdaBoost. 436-439
Speech Analysis
- Etan Fisher, Joseph Tabrikian, Shlomo Dubnov:
Generalized likelihood ratio test for voiced/unvoiced decision using the harmonic plus noise model. 440-443 - Ye Tian, Ji Wu, Zuoying Wang, Dajin Lu:
Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection. 444-447 - Om Deshmukh, Carol Y. Espy-Wilson:
A measure of aperiodicity and periodicity in speech. 448-451 - Pusadee Seresangtakul, Tomio Takara:
A generative model of fundamental frequency contours for polysyllabic words of Thai tones. 452-455 - Ching X. Xu, Yi Xu:
F0 perturbations by consonants and their implications on tone recognition. 456-459 - Wai C. Chu:
Gradient-descent based window optimization for linear prediction analysis. 460-463 - Issam Bazzi, Alex Acero, Li Deng:
An expectation maximization approach for formant tracking using a parameter-free non-linear predictor. 464-467 - Dong Wang, Lie Lu
, Hong-Jiang Zhang:
Speech segmentation without speech recognition. 468-471