


default search action
INTERSPEECH 2004: Lisbon, Portugal
- 8th International Conference on Spoken Language Processing, INTERSPEECH-ICSLP 2004, Jeju Island, Korea, October 4-8, 2004. ISCA 2004

Plenary Talks
- Chin-Hui Lee:

From decoding-driven to detection-based paradigms for automatic speech recognition. - Hyun-Bok Lee:

In search of a universal phonetic alphabet - theory and application of an organic visible speech-. - Jacqueline Vaissière:

From X-ray or MRU data to sounds through articulatory synthesis: towards an integrated view of the speech communication process.
Speech Recognition - Adaptation
- Sreeram Balakrishnan, Karthik Visweswariah, Vaibhava Goel:

Stochastic gradient adaptation of front-end parameters. 1-4 - Antoine Raux, Rita Singh:

Maximum - likelihod adaptation of semi-continuous HMMs by latent variable decomposition of state distributions. 5-8 - Chao Huang, Tao Chen, Eric Chang:

Transformation and combination of hiden Markov models for speaker selection training. 9-12 - Brian Kan-Wing Mak, Roger Wend-Huu Hsiao:

Improving eigenspace-based MLLR adaptation by kernel PCA. 13-16 - Nikos Chatzichrisafis, Vassilios Digalakis, Vassilios Diakoloukas, Costas Harizakis:

Rapid acoustic model development using Gaussian mixture clustering and language adaptation. 17-20 - Karthik Visweswariah, Ramesh A. Gopinath:

Adaptation of front end parameters in a speech recognizer. 21-24 - Diego Giuliani, Matteo Gerosa, Fabio Brugnara:

Speaker normalization through constrained MLLR based transforms. 2893-2896 - Xiangyu Mu, Shuwu Zhang, Bo Xu:

Multi-layer structure MLLR adaptation algorithm with subspace regression classes and tying. 2897-2900 - Georg Stemmer, Stefan Steidl, Christian Hacker, Elmar Nöth:

Adaptation in the pronunciation space for non-native speech recognition. 2901-2904 - Xuechuan Wang, Douglas D. O'Shaughnessy:

Robust ASR model adaptation by feature-based statistical data mapping. 2905-2908 - Zhaobing Han, Shuwu Zhang, Bo Xu:

A novel target-driven generalized JMAP adaptation algorithm. 2909-2912 - Brian Mak, Simon Ka-Lung Ho, James T. Kwok:

Speedup of kernel eigenvoice speaker adaptation by embedded kernel PCA. 2913-2916 - Hyung Bae Jeon, Dong Kook Kim:

Maximum a posteriori eigenvoice speaker adaptation for Korean connected digit recognition. 2917-2920 - Wei Wang, Stephen A. Zahorian:

Vocal tract normalization based on spectral warping. 2921-2924 - Koji Tanaka, Fuji Ren, Shingo Kuroiwa, Satoru Tsuge:

Acoustic model adaptation for coded speech using synthetic speech. 2925-2928 - Motoyuki Suzuki, Hirokazu Ogasawara, Akinori Ito, Yuichi Ohkawa, Shozo Makino:

Speaker adaptation method for CALL system using bilingual speakers' utterances. 2929-2932 - Shinji Watanabe:

Acoustic model adaptation based on coarse/fine training of transfer vectors and its application to a speaker adaptation task. 2933-2936 - Wei-Ho Tsai, Shih-Sian Cheng, Hsin-Min Wang:

Speaker clustering of speech utterances using a voice characteristic reference space. 2937-2940 - Young Kuk Kim, Hwa Jeon Song, Hyung Soon Kim:

Performance improvement of connected digit recognition using unsupervised fast speaker adaptation. 2941-2944 - Hyung Soon Kim, Hwa Jeon Song:

Simultaneous estimation of weights of eigenvoices and bias compensation vector for rapid speaker adaptation. 2945-2948 - Matthias Wölfel:

Speaker dependent model order selection of spectral envelopes. 2949-2952 - Enrico Bocchieri, Michael Riley, Murat Saraclar:

Methods for task adaptation of acoustic models with limited transcribed in-domain data. 2953-2956 - Atsushi Fujii, Tetsuya Ishikawa, Katsunobu Itou, Tomoyosi Akiba:

Unsupervised topic adaptation for lecture speech retrieval. 2957-2960 - Haibin Liu, Zhenyang Wu:

Mean and covariance adaptation based on minimum classification error linear regression for continuous density HMMs. 2961-2964 - Goshu Nagino, Makoto Shozakai:

Design of ready-made acoustic model library by two-dimensional visualization of acoustic space. 2965-2968
Spoken Language Identification, Translation and Retrieval I
- Jean-Luc Gauvain, Abdelkhalek Messaoudi, Holger Schwenk:

Language recognition using phone latices. 25-28 - Mark A. Huckvale:

ACCDIST: a metric for comparing speakers' accents. 29-32 - Michael Levit, Allen L. Gorin, Patrick Haffner, Hiyan Alshawi, Elmar Nöth:

Aspects of named entity processing. 33-36 - Josep Maria Crego, José B. Mariño, Adrià de Gispert:

Finite-state-based and phrase-based statistical machine translation. 37-40 - Tanja Schultz, Szu-Chen Stan Jou, Stephan Vogel, Shirin Saleem:

Using word latice information for a tighter coupling in speech translation systems. 41-44 - Teruhisa Misu, Tatsuya Kawahara, Kazunori Komatani:

Confirmation strategy for document retrieval systems with spoken dialog interface. 45-48 - Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh:

Multilayer subword units for open-vocabulary spoken document retrieval. 1553-1556 - Yoshiaki Itoh, Kazuyo Tanaka, Shi-wook Lee:

An efficient partial matching algorithm toward speech retrieval by speech. 1557-1560 - Celestin Sedogbo, Sébastien Herry, Bruno Gas, Jean-Luc Zarader:

Language detection by neural discrimination. 1561-1564 - Ricardo de Córdoba, Javier Ferreiros, Valentín Sama, Javier Macías Guarasa, Luis Fernando D'Haro, Fernando Fernández Martínez:

Language identification techniques based on full recognition in an air traffic control task. 1565-1568 - John H. L. Hansen, Umit H. Yapanel, Rongqing Huang, Ayako Ikeno:

Dialect analysis and modeling for automatic classification. 1569-1572 - Emmanuel Ferragne, François Pellegrino:

Rhythm in read british English: interdialect variability. 1573-1576 - Pascale Fung, Yi Liu, Yongsheng Yang, Yihai Shen, Dekai Wu:

A grammar-based Chinese to English speech translation system for portable devices. 1577-1580 - Gökhan Tür:

Cost-sensitive call classification. 1581-1584 - Mikko Kurimo, Ville T. Turunen, Inger Ekman:

An evaluation of a spoken document retrieval baseline system in finish. 1585-1588 - Hui Jiang, Pengfei Liu, Imed Zitouni:

Discriminative training of naive Bayes classifiers for natural language call routing. 1589-1592 - Nicolas Moreau, Hyoung-Gook Kim, Thomas Sikora:

Phonetic confusion based document expansion for spoken document retrieval. 1593-1596 - Euisok Chung, Soojong Lim, Yi-Gyu Hwang, Myung-Gil Jang:

Hybrid named entity recognition for question-answering system. 1597-1600 - Jitendra Ajmera, Iain McCowan, Hervé Bourlard:

An online audio indexing system. 1601-1604 - Eric Sanders, Febe de Wet:

Histogram normalisation and the recognition of names and ontology words in the MUMIS project. 1605-1608 - Rui Amaral, Isabel Trancoso:

Improving the topic indexation and segmentation modules of a media watch system. 1609-1612 - Melissa Barkat-Defradas, Rym Hamdi, Emmanuel Ferragne, François Pellegrino:

Speech timing and rhythmic structure in arabic dialects: a comparison of two approaches. 1613-1616 - Hsin-Min Wang, Shih-Sian Cheng:

METRIC-SEQDAC: a hybrid approach for audio segmentation. 1617-1620 - Jen-Wei Kuo, Yao-Min Huang, Berlin Chen, Hsin-Min Wang:

Statistical Chinese spoken document retrieval using latent topical information. 1621-1624 - Masahiko Matsushita, Hiromitsu Nishizaki, Seiichi Nakagawa, Takehito Utsuro:

Keyword recognition and extraction by multiple-LVCSRs with 60, 000 words in speech-driven WEB retrieval task. 1625-1628 - Ruiqiang Zhang, Gen-ichiro Kikui, Hirofumi Yamamoto, Frank K. Soong, Taro Watanabe, Eiichiro Sumita, Wai Kit Lo:

Improved spoken language translation using n-best speech recognition hypotheses. 1629-1632 - Kakeung Wong, Man-Hung Siu:

Automatic language identification using discrete hidden Markov model. 1633-1636 - Bowen Zhou, Daniel Déchelotte, Yuqing Gao:

Two-way speech-to-speech translation on handheld devices. 1637-1640 - Hervé Blanchon:

HLT modules scalability within the NESPOLE! project. 1641-1644
Linguistics, Phonology, and Phonetics
- Midam Kim:

Correlation between VOT and F0 in the perception of Korean stops and affricates. 49-52 - Aude Noiray, Lucie Ménard, Marie-Agnès Cathiard, Christian Abry, Christophe Savariaux:

The development of anticipatory labial coarticulation in French: a pionering study. 53-56 - Melvyn John Hunt:

Speech recognition, sylabification and statistical phonetics. 57-60 - Jilei Tian:

Data-driven approaches for automatic detection of syllable boundaries. 61-64 - Anne Cutler, Dennis Norris, Núria Sebastián-Gallés:

Phonemic repertoire and similarity within the vocabulary. 65-68 - Sameer Maskey, Alan W. Black, Laura Tomokiya:

Boostrapping phonetic lexicons for new languages. 69-72 - Mirjam Broersma, K. Marieke Kolkman:

Lexical representation of non-native phonemes. 1241-1244 - Jong-Pyo Lee, Tae-Yeoub Jang:

A comparative study on the production of inter-stress intervals of English speech by English native speakers and Korean speakers. 1245-1248 - Emi Zuiki Murano, Mihoko Teshigawara:

Articulatory correlates of voice qualities of god guys and bad guys in Japanese anime: an MRI study. 1249-1252 - Sorin Dusan:

Effects of phonetic contexts on the duration of phonetic segments in fluent read speech. 1253-1256 - Qiang Fang:

A study on nasal coda los in continuous speech. 1257-1260 - Hua-Li Jian:

An improved pair-wise variability index for comparing the timing characteristics of speech. 1261-1264 - Hua-Li Jian:

An acoustic study of speech rhythm in taiwan English. 1265-1268 - Sung-A. Kim:

Language specific phonetic rules: evidence from domain-initial strengthening. 1269-1272 - Hansang Park:

Spectral characteristics of the release bursts in Korean alveolar stops. 1273-1276 - Rob van Son, Olga Bolotova, Louis C. W. Pols, Mietta Lennes:

Frequency effects on vowel reduction in three typologically different languages (dutch, finish, Russian). 1277-1280 - Julia Abresch, Stefan Breuer:

Assessment of non-native phones in anglicisms by German listeners. 1281-1284 - Sunhee Kim:

Phonology of exceptions for for Korean grapheme-to-phoneme conversion. 1285-1289 - Shigeyoshi Kitazawa, Shinya Kiriyama:

Acoustic and prosodic analysis of Japanese vowel-vowel hiatus with laryngeal effect. 1289-1293 - Kimiko Tsukada:

A cross-linguistic acoustic comparison of unreleased word-final stops: Korean and Thai. 1293-1296 - Taehong Cho, Elizabeth K. Johnson:

Acoustic correlates of phrase-internal lexical boundaries in dutch. 1297-1300 - Taehong Cho, James M. McQueen:

Phonotactics vs. phonetic cues in native and non-native listening: dutch and Korean listeners' perception of dutch and English. 1301-1304 - Svetlana Kaminskaia, François Poiré:

Comparing intonation of two varieties of French using normalized F0 values. 1305-1308 - Mira Oh, Kee-Ho Kim:

Phonetic realization of the suffix-suppressed accentual phrase in Korean. 1309-1312 - H. Timothy Bunnell, James B. Polikoff, Jane McNicholas:

Spectral moment vs. bark cepstral analysis of children's word-initial voiceles stops. 1313-1316 - Nobuaki Minematsu:

Pronunciation assessment based upon the compatibility between a learner's pronunciation structure and the target language's lexical structure. 1317-1320 - Kenji Yoshida:

Spread of high tone in akita Japanese. 1321-1324
Biomedical Applications of Speech Analysis
- Juan Ignacio Godino-Llorente, María Victoria Rodellar Biarge, Pedro Gómez-Vilda, Francisco Díaz Pérez, Agustín Álvarez-Marquina, Rafael Martínez-Olalla:

Biomechanical parameter fingerprint in the mucosal wave power spectral density. 73-76 - Cheolwoo Jo, Soo-Geon Wang, Byung-Gon Yang, Hyung-Soon Kim, Tao Li:

Classification of pathological voice including severely noisy cases. 77-80 - Qiang Fu, Peter Murphy:

A robust glottal source model estimation technique. 81-84 - Hiroki Mori, Yasunori Kobayashi, Hideki Kasuya, Hajime Hirose, Noriko Kobayashi:

F0 and formant frequency distribution of dysarthric speech - a comparative study. 85-88 - Hideki Kawahara, Yumi Hirachi, Masanori Morise, Hideki Banno:

Procedure "senza vibrato": a key component for morphing singing. 89-92 - Claudia Manfredi, Giorgio Peretti, Laura Magnoni, Fabrizio Dori, Ernesto Iadanza:

Thyroplastic medialisation in unilateral vocal fold paralysis: assessing voice quality recovering. 93-96 - Gernot Kubin, Martin Hagmüller:

Voice enhancement of male speakers with laryngeal neoplasm. 541-544 - Jong Min Choi, Myung-Whun Sung, Kwang Suk Park, Jeong-Hun Hah:

A comparison of the perturbation analysis between PRAAT and computerize speech lab. 545-548
Robust Speech Recognition on AURORA
- Ji Ming, Baochun Hou:

Evaluation of universal compensation on Aurora 2 and 3 and beyond. 97-100 - Hugo Van hamme:

PROSPECT features and their application to missing data techniques for robust speech recognition. 101-104 - Hugo Van hamme, Patrick Wambacq, Veronique Stouten:

Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. 105-108 - Hans-Günter Hirsch, Harald Finster:

Applying the Aurora feature extraction schemes to a phoneme based recognition task. 109-112 - Zhipeng Zhang, Tomoyuki Ohya, Sadaoki Furui:

Evaluation of tree-structured piecewise linear transformation-based noise adaptation on AURORA2 database. 113-116 - Tor André Myrvoll, Satoshi Nakamura:

Online minimum mean square error filtering of noisy cepstral coefficients using a sequential EM algorithm. 117-120 - Akira Sasou, Kazuyo Tanaka, Satoshi Nakamura, Futoshi Asano:

HMM-based feature compensation method: an evaluation using the AURORA2. 121-124 - Xuechuan Wang, Douglas D. O'Shaughnessy:

Noise adaptation for robust AURORA 2 noisy digit recognition using statistical data mapping. 125-128 - Benjamin J. Shannon, Kuldip K. Paliwal:

MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition. 129-132 - Muhammad Ghulam, Takashi Fukuda, Junsei Horikawa, Tsuneo Nitta:

A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR. 133-136 - José C. Segura, Ángel de la Torre, Javier Ramírez, Antonio J. Rubio, M. Carmen Benítez:

Including uncertainty of speech observations in robust speech recognition. 137-140 - Takeshi Yamada, Jiro Okada, Nobuhiko Kitawaki:

Integration of n-best recognition results obtained by multiple noise reduction algorithms. 141-144 - Panji Setiawan, Sorel Stan, Tim Fingscheidt:

Revisiting some model-based and data-driven denoising algorithms in Aurora 2 context. 145-148 - Guo-Hong Ding, Bo Xu:

Exploring high-performance speech recognition in noisy environments using high-order taylor series expansion. 149-152 - Wing-Hei Au, Man-Hung Siu:

A robust training algorithm based on neighborhood information. 153-156 - Siu Wa Lee, Pak-Chung Ching:

In-phase feature induction: an effective compensation technique for robust speech recognition. 157-160 - Jeff Siu-Kei Au-Yeung, Man-Hung Siu:

Improved performance of Aurora 4 using HTK and unsupervised MLLR adaptation. 161-164 - Shang-nien Tsai, Lin-Shan Lee:

A new feature extraction front-end for robust speech recognition using progressive histogram equalization and multi-eigenvector temporal filtering. 165-168
Spoken / Multimodal Dialogue System
- Christian Fügen, Hartwig Holzapfel, Alex Waibel:

Tight coupling of speech recognition and dialog management - dialog-context dependent grammar weighting for speech recognition. 169-172 - Akinobu Lee, Keisuke Nakamura, Ryuichi Nisimura, Hiroshi Saruwatari, Kiyohiro Shikano:

Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs. 173-176 - Hironori Oshikawa, Norihide Kitaoka, Seiichi Nakagawa:

Speech interface for name input based on combination of recognition methods using syllable-based n-gram and word dictionary. 177-180 - Imed Zitouni, Minkyu Lee, Hui Jiang:

Constrained minimization technique for topic identification using discriminative training and support vector machines. 181-184 - Jason D. Williams, Steve J. Young:

Characterizing task-oriented dialog using a simulated ASR chanel. 185-188 - Takashi Konashi, Motoyuki Suzuki, Akinori Ito, Shozo Makino:

A spoken dialog system based on automatic grammar generation and template-based weighting for autonomous mobile robots. 189-192 - Akinori Ito, Takanobu Oba, Takashi Konashi, Motoyuki Suzuki, Shozo Makino:

Noise adaptive spoken dialog system based on selection of multiple dialog strategies. 193-196 - Mikko Hartikainen, Markku Turunen, Jaakko Hakulinen, Esa-Pekka Salonen, J. Adam Funk:

Flexible dialogue management using distributed and dynamic dialogue control. 197-200 - Keith Houck:

Contextual revision in information seeking conversation systems. 201-204 - Ian M. O'Neill, Philip Hanna, Xingkun Liu, Michael F. McTear:

Cross domain dialogue modelling: an object-based approach. 205-208 - Hirohiko Sagawa, Teruko Mitamura, Eric Nyberg:

A comparison of confirmation styles for error handling in a speech dialog system. 209-212 - Fan Yang, Peter A. Heeman:

Using computer simulation to compare two models of mixed-initiative. 213-216 - Fan Yang, Peter A. Heeman, Kristy Hollingshead:

Towards understanding mixed-initiative in task-oriented dialogues. 217-220 - Peter Wolf, Joseph Woelfel, Jan C. van Gemert, Bhiksha Raj, David Wong:

Spokenquery: an alternate approach to chosing items with speech. 221-224 - Shona Douglas, Deepak Agarwal, Tirso Alonso, Robert M. Bell, Mazin G. Rahim, Deborah F. Swayne, Chris Volinsky:

Mining customer care dialogs for "daily news". 225-228 - Jens Edlund, Gabriel Skantze, Rolf Carlson:

Higgins - a spoken dialogue system for investigating error handling techniques. 229-232 - Fuliang Weng, Lawrence Cavedon, Badri Raghunathan, Danilo Mirkovic, Hua Cheng, Hauke Schmidt, Harry Bratt, Rohit Mishra, Stanley Peters, Sandra Upson, Elizabeth Shriberg, Carsten Bergmann, Lin Zhao:

A conversational dialogue system for cognitively overloaded users. 233-236 - Gerhard Hanrieder, Stefan W. Hamerich:

Modeling generic dialog applications for embedded systems. 237-240 - Matthew N. Stuttle, Jason D. Williams, Steve J. Young:

A framework for dialogue data collection with a simulated ASR channel. 241-244 - Shimei Pan:

A multi-layer conversation management approach for information seeking applications. 245-248 - Thomas K. Harris, Roni Rosenfeld:

A universal speech interface for appliances. 249-252 - Keita Hayashi, Yuki Irie, Yukiko Yamaguchi, Shigeki Matsubara, Nobuo Kawaguchi:

Speech understanding, dialogue management and response generation in corpus-based spoken dialogue system. 253-256 - Fernando Fernández Martínez, Valentín Sama, Luis Fernando D'Haro, Rubén San Segundo, Ricardo de Córdoba, Juan Manuel Montero:

Implementation of dialog applications in an open-source voiceXML platform. 257-260 - Chun Wai Lau, Bin Ma, Helen Mei-Ling Meng, Yiu Sang Moon, Yeung Yam:

Fuzzy logic decision fusion in a multimodal biometric system. 261-264 - Peter Poller, Norbert Reithinger:

A state model for the realization of visual perceptive feedback in smartkom. 265-268 - Akemi Iida, Yoshito Ueno, Ryohei Matsuura, Kiyoaki Aikawa:

A vector-based method for efficiently representing multivariate environmental information. 269-272 - Ioannis Toptsis, Shuyin Li, Britta Wrede, Gernot A. Fink:

A multi-modal dialog system for a mobile robot. 273-276 - Niels Ole Bernsen, Laila Dybkjær:

Structured interview-based evaluation of spoken multimodal conversation with h.c. andersen. 277-280
Speech Recognition - Search
- Miroslav Novak, Vladimír Bergl:

Memory efficient decoding graph compilation with wide cross-word acoustic context. 281-284 - Dongbin Zhang, Limin Du:

Dynamic beam pruning strategy using adaptive control. 285-288 - Takaaki Hori, Chiori Hori, Yasuhiro Minami:

Fast on-the-fly composition for weighted finite-state transducers in 1.8 million-word vocabulary continuous speech recognition. 289-292 - Peng Yu, Frank Torsten Bernd Seide:

A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech. 293-296 - Lubos Smídl

, Ludek Müller:
Keyword spotting for highly inflectional languages. 297-300 - Frédéric Tendeau:

Optimizing an engine network that allows dynamic masking. 301-304
Spoken Dialogue and Systems
- Katsutoshi Ohtsuki, Nobuaki Hiroshima, Yoshihiko Hayashi, Katsuji Bessho, Shoichi Matsunaga:

Topic structure extraction for meeting indexing. 305-308 - Sophie Rosset, Lori Lamel:

Automatic detection of dialog acts based on multilevel information. 309-312 - Gina-Anne Levow:

Identifying local corrections in human-computer dialogue. 313-316 - Peter Reichl, Florian Hammer:

Hot discussion or frosty dialogue? towards a temperature metric for conversational interactivity. 317-320 - Stephanie Seneff, Chao Wang, I. Lee Hetherington, Grace Chung:

A dynamic vocabulary spoken dialogue interface. 321-324 - Matthias Denecke, Kohji Dohsaka, Mikio Nakano:

Learning dialogue policies using state aggregation in reinforcement learning. 325-328
Speech Perception
- Keren B. Shatzman:

Segmenting ambiguous phrases using phoneme duration. 329-332 - Shuichi Sakamoto, Yôiti Suzuki, Shigeaki Amano, Tadahisa Kondo, Naoki Iwaoka:

A compensation method for word-familiarity difference with SNR control in intelligibility test. 333-336 - Takashi Otake, Yoko Sakamoto, Yasuyuki Konomi:

Phoneme-based word activation in spoken-word recognition: evidence from Japanese school children. 337-340 - Belynda Brahimi, Philippe Boula de Mareüil, Cédric Gendrot:

Role of segmental and suprasegmental cues in the perception of maghrebian-acented French. 341-344 - Hiroaki Kato, Yoshinori Sagisaka, Minoru Tsuzaki, Makiko Muto:

Effect of speaking rate on the acceptability of change in segment duration. 345-348 - Kiyoko Yoneyama:

A cross-linguistic study of diphthongs in spoken word processing in Japanese and English. 349-352
Multi-Lingual Speech-to-Speech Translation
- Alex Waibel:

Speech translation: past, present and future. 353-356 - Gen-ichiro Kikui, Toshiyuki Takezawa, Seiichi Yamamoto:

Multilingual corpora for speech-to-speech translation research. 357-360 - Hermann Ney:

Statistical machine translation and its challenges. 361-364 - John Lee, Stephanie Seneff:

Translingual grammar induction. 365-368 - Youngjik Lee, Jun Park, Seung-Shin Oh:

Usability considerations of speech-to-speech translation system. 369-372 - Gianni Lazzari, Alex Waibel, Chengqing Zong:

Worldwide ongoing activities on multilingual speech to speech translation. 373-376
Speech Recognition - Large Vocabulary
- Dominique Fohr, Odile Mella, Christophe Cerisara, Irina Illina:

The automatic news transcription system: ANTS, some real time experiments. 377-380 - Bhuvana Ramabhadran, Olivier Siohan, Geoffrey Zweig:

Use of metadata to improve recognition of spontaneous speech and named entities. 381-384 - Janne Pylkkönen, Mikko Kurimo:

Duration modeling techniques for continuous speech recognition. 385-388 - Tanel Alumäe:

Large vocabulary continuous speech recognition for estonian using morpheme classes. 389-392 - Zhaobing Han, Shuwu Zhang, Bo Xu:

Combining agglomerative and tree-based state clustering for high accuracy acoustic modeling. 393-396 - William S.-Y. Wang, Gang Peng:

Parallel tone score association method for tone language speech recognition. 397-400 - Jing Zheng, Horacio Franco, Andreas Stolcke:

Effective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition. 401-404 - L. Sarada Ghadiyaram, Hemalatha Nagarajan, Nagarajan Thangavelu, Hema A. Murthy:

Automatic transcription of continuous speech using unsupervised and incremental training. 405-408 - Jan Nouza, Dana Nejedlová, Jindrich Zdánský, Jan Kolorenc:

Very large vocabulary speech recognition system for automatic transcription of czech broadcast programs. 409-412 - Olivier Siohan, Bhuvana Ramabhadran, Geoffrey Zweig:

Speech recognition error analysis on the English MALACH corpus. 413-416 - Rong Zhang, Alexander I. Rudnicky:

A frame level boosting training scheme for acoustic modeling. 417-420 - Rong Zhang, Alexander I. Rudnicky:

Optimizing boosting with discriminative criteria. 421-424 - Xianghua Xu, Qiang Guo, Jie Zhu:

Restructuring HMM states for speaker adaptation in Mandarin speech recognition. 425-428 - Mike Matton, Mathias De Wachter, Dirk Van Compernolle, Ronald Cools:

A discriminative locally weighted distance measure for speaker independent template based speech recognition. 429-432 - Yohei Itaya, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:

Deterministic annealing EM algorithm in parameter estimation for acoustic model. 433-436 - Frantisek Grézl, Martin Karafiát, Jan Cernocký:

TRAP based features for LVCSR of meting data. 437-440 - Frank K. Soong, Wai Kit Lo, Satoshi Nakamura:

Optimal acoustic and language model weights for minimizing word verification errors. 441-444 - Atsushi Sako, Yasuo Ariki:

Structuring of baseball live games based on speech recognition using task dependant knowledge. 445-448 - Zhengyu Zhou, Helen M. Meng:

A two-level schema for detecting recognition errors. 449-452 - In-Jeong Choi, Nam-Hoon Kim, Su Youn Yoon:

Large vocabulary continuous speech recognition based on cross-morpheme phonetic information. 453-456 - Changxue Ma:

Automatic phonetic base form generation based on maximum context tree. 457-460 - Gustavo Hernández Ábrego, Lex Olorenshaw, Raquel Tato, Thomas Schaaf:

Dictionary refinements based on phonetic consensus and non-uniform pronunciation reduction. 1697-1700 - Abdelkhalek Messaoudi, Lori Lamel, Jean-Luc Gauvain:

Transcription of arabic broadcast news. 1701-1704 - Takahiro Shinozaki, Sadaoki Furui:

Spontaneous speech recognition using a massively parallel decoder. 1705-1708 - Tanja Schultz, Qin Jin, Kornel Laskowski, Yue Pan, Florian Metze, Christian Fügen:

Issues in meeting transcription - the ISL meeting transcription system. 1709-1712 - Katsutoshi Ohtsuki, Nobuaki Hiroshima, Shoichi Matsunaga, Yoshihiko Hayashi:

Multi-pass ASR using vocabulary expansion. 1713-1716 - Vlasios Doumpiotis, William Byrne:

Pinched lattice minimum Bayes risk discriminative training for large vocabulary continuous speech recognition. 1717-1720 - Izhak Shafran, William Byrne:

Task-specific minimum Bayes-risk decoding using learned edit distance. 1945-1948 - Rong Zhang, Alexander I. Rudnicky:

Apply n-best list re-ranking to acoustic model combinations of boosting training. 1949-1952 - Do Yeong Kim, Srinivasan Umesh, Mark J. F. Gales, Thomas Hain, Philip C. Woodland:

Using VTLN for broadcast news transcription. 1953-1956 - Andreas Stolcke, Chuck Wooters

, Ivan Bulyko, Martin Graciarena, Scott Otterson, Barbara Peskin, Mari Ostendorf, David Gelbart, Nikki Mirghafori, Tuomo W. Pirinen:
From switchboard to meetings: development of the 2004 ICSI-SRI-UW meeting recognition system. 1957-1960 - Anand Venkataraman, Andreas Stolcke, Wen Wang, Dimitra Vergyri, Jing Zheng, Venkata Ramana Rao Gadde:

An efficient repair procedure for quick transcriptions. 1961-1964 - Yao Qian, Tan Lee, Frank K. Soong:

Tone information as a confidence measure for improving Cantonese LVCSR. 1965-1968
Speech Science
- Danielle Duez:

Temporal variables in parkinsonian speech. 461-464 - Olov Engwall:

Speaker adaptation of a three-dimensional tongue model. 465-468 - Nicole Cooper, Anne Cutler:

Perception of non-native phonemes in noise. 469-472 - Hideki Kawahara, Hideki Banno, Toshio Irino, Jiang Jin:

Intelligibility of degraded speech from smeared STRAIGHT spectrum. 473-476 - Young-Ik Kim, Rhee Man Kil:

Sound source localization based on zero-crosing peak-amplitude coding. 477-480 - Sachiyo Kajikawa, Laurel Fais, Shigeaki Amano, Janet F. Werker:

Adult and infant sensitivity to phonotactic features in spoken Japanese. 481-484 - Phil D. Green, James Carmichael:

Revisiting dysarthria assessment intelligibility metrics. 485-488 - Valter Ciocca, Tara L. Whitehill, Joan K.-Y. Ma:

The effect of intonation on perception of Cantonese lexical tones. 489-492 - Toshiko Isei-Jaakkola:

Maximum short quantity in Japanese and finish in two perception tests with F0 and db variants. 493-496 - Paavo Alku, Matti Airas, Brad H. Story:

Evaluation of an inverse filtering technique using physical modeling of voice production. 497-500 - Hui-ju Hsu, Janice Fon:

Positional and phonotactic effects on the realization of taiwan Mandarin tone 2. 501-504 - Karl Schnell, Arild Lacroix:

Speech production based on lossy tube models: unit concatenation and sound transitions. 505-508 - Qin Yan, Saeed Vaseghi, Dimitrios Rentzos, Ching-Hsiang Ho:

Modelling and ranking of differences across formants of british, australian and american accents. 509-512 - Tatsuya Kitamura, Satoru Fujita, Kiyoshi Honda, Hironori Nishimoto:

An experimental method for measuring transfer functions of acoustic tubes. 513-516 - Takuya Tsuji, Tokihiko Kaburagi, Kohei Wakamiya, Jiji Kim:

Estimation of the vocal tract spectrum from articulatory movements using phoneme-dependent neural networks. 517-520 - Kunitoshi Motoki, Hiroki Matsuzaki:

Computation of the acoustic characteristics of vocal-tract models with geometrical perturbation. 521-524 - P. Vijayalakshmi, M. Ramasubba Reddy:

Analysis of hypernasality by synthesis. 525-528 - Abdellah Kacha, Francis Grenez, Frédéric Bettens, Jean Schoentgen:

Adaptive long-term predictive analysis of disordered speech. 529-532 - Slobodan Jovicic, Sandra Antesevic, Zoran Saric:

Phoneme restoration in degraded speech communication. 533-536 - Maria Marinaki, Constantine Kotropoulos, Ioannis Pitas, Nikolaos Maglaveras:

Automatic detection of vocal fold paralysis and edema. 537-540
Novel Features in ASR
- Yasuhiro Minami, Erik McDermott, Atsushi Nakamura, Shigeru Katagiri:

A theoretical analysis of speech recognition based on feature trajectory models. 549-552 - Zhijian Ou, Zuoying Wang:

Discriminative combination of multiple linear predictions for speech recognition. 553-556 - Davood Gharavian, Seyed Mohammad Ahadi:

Use of formants in stressed and unstressed continuous speech recognition. 557-560 - Konstantin Markov, Satoshi Nakamura, Jianwu Dang:

Integration of articulatory dynamic parameters in HMM/BN based speech recognition system. 561-564 - Leigh David Alsteris, Kuldip K. Paliwal:

ASR on speech reconstructed from short-time fourier phase spectra. 565-568
Spoken and Natural Language Understanding
- Robert Lieb, Tibor Fábián, Günther Ruske, Matthias Thomae:

Estimation of semantic confidences on lattice hierarchies. 569-572 - Fumiyo Fukumoto, Yoshimi Suzuki:

Learning subject drift for topic tracking. 573-576 - Elizabeth Shriberg, Andreas Stolcke, Dustin Hillard, Mari Ostendorf, Barbara Peskin, Mary P. Harper, Yang Liu:

The ICSI-SRI-UW metadata extraction system. 577-580 - Mark Hasegawa-Johnson, Stephen E. Levinson, Tong Zhang:

Automatic detection of contrast for speech understanding. 581-584 - Nick Jui-Chang Wang, Jia-Lin Shen, Ching-Ho Tsai:

Integrating layer concept inform ation into n-gram modeling for spoken language understanding. 585-588 - Junyan Chen, Ji Wu, Zuoying Wang:

A robust understanding model for spoken dialogues. 589-592 - Chai Wutiwiwatchai, Sadaoki Furui:

Belief-based nonlinear rescoring in Thai speech understanding. 2129-2133 - Toshihiko Itoh, Atsuhiko Kai, Yukihiro Itoh, Tatsuhiro Konishi:

An understanding strategy based on plausibility score in recognition history using CSR confidence measure. 2133-2136 - Sangkeun Jung, Minwoo Jeong, Gary Geunbae Lee:

Speech recognition error correction using maximum entropy language model. 2137-2140 - Xiang Li, Juan M. Huerta:

Discriminative training of compound-word based multinomial classifiers for speech routing. 2141-2144 - Jihyun Eun, Changki Lee, Gary Geunbae Lee:

An information extraction approach for spoken language understanding. 2145-2148 - David Horowitz, Partha Lal, Pierce Gerard Buckley:

A maximum entropy shallow functional parser for spoken language understanding. 2149-2152 - Qiang Huang, Stephen J. Cox:

Mixture language models for call routing. 2153-2156 - Chung-Hsien Wu, Jui-Feng Yeh, Ming-Jun Chen:

Speech act identification using an ontology-based partial pattern tree. 2157-2160 - Ye-Yi Wang, Yun-Cheng Ju:

Creating speech recognition grammars from regular expressions for alphanumeric concepts. 2161-2164 - Isabel Trancoso, Paulo Araújo, Céu Viana, Nuno J. Mamede:

Poetry assistant. 2165-2168 - Tasuku Kitade, Tatsuya Kawahara, Hiroaki Nanjo:

Automatic extraction of key sentences from oral presentations using statistical measure based on discourse markers. 2169-2172 - Tomohiro Ohno, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki:

Robust dependency parsing of spontaneous Japanese speech and its evaluation. 2173-2176 - Wolfgang Minker, Dirk Bühler, Christiane Beuschel:

Strategies for optimizing a stochastic spoken natural language parser. 2177-2180 - Tzu-Lun Lee, Ya-Fang He, Yun-Ju Huang, Shu-Chuan Tseng, Robert Eklund:

Prolongation in spontaneous Mandarin. 2181-2184 - Yuki Irie, Shigeki Matsubara, Nobuo Kawaguchi, Yukiko Yamaguchi, Yasuyoshi Inagaki:

Speech intention understanding based on decision tree learning. 2185-2188 - Satanjeev Banerjee, Alexander I. Rudnicky:

Using simple speech-based features to detect the state of a meeting and the roles of the meeting participants. 2189-2192 - Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe Kazemzadeh, Zhigang Deng, Sungbok Lee, Shrikanth S. Narayanan, Carlos Busso:

An acoustic study of emotions expressed in speech. 2193-2196 - Tatsuya Kawahara, Ian Richard Lane, Tomoko Matsui, Satoshi Nakamura:

Topic classification and verification modeling for out-of-domain utterance detection. 2197-2200 - So-Young Park, Yong-Jae Kwak, Joon-Ho Lim, Hae-Chang Rim, Soo-Hong Kim:

Partially lexicalized parsing model utilizing rich features. 2201-2204 - Yoshimi Suzuki, Fumiyo Fukumoto, Yoshihiro Sekiguchi:

Clustering similar nouns for selecting related news articles. 2205-2208 - Leonardo Badino:

Chinese text word-segmentation considering semantic links among sentences. 2209-2212 - Do-Gil Lee, Hae-Chang Rim:

Syllable-based probabilistic morphological analysis model of Korean. 2213-2216
Speaker Segmentation and Clustering
- Fabio Valente, Christian Wellekens:

Scoring unknown speaker clustering : VB vs. BIC. 593-596 - Qin Jin, Tanja Schultz:

Speaker segmentation and clustering in meetings. 597-600 - Lori Lamel, Jean-Luc Gauvain, Leonardo Canseco-Rodriguez:

Speaker diarization from speech transcripts. 601-604 - Xavier Anguera Miró, Javier Hernando Pericas:

Evolutive speaker segmentation using a repository system. 605-608 - Hagai Aronowitz, David Burshtein, Amihood Amir:

Speaker indexing in audio archives using test utterance Gaussian mixture modeling. 609-612 - Antoine Raux:

Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition. 613-616
Speech Processing in a Packet Network Environment
- Kuldip K. Paliwal, Stephen So:

Scalable distributed speech recognition using multi-frame GMM-based block quantization. 617-620 - Naveen Srinivasamurthy, Kyu Jeong Han, Shrikanth S. Narayanan:

Robust speech recognition over packet networks: an overview. 621-624 - Thomas Eriksson, Samuel Kim, Hong-Goo Kang, Chungyong Lee:

Theory for speaker recognition over IP. 625-628 - Wu Chou, Feng Liu:

Voice portal services in packet network and voIP environment. 629-632 - Peter Kabal, Colm Elliott:

Synchronization of speaker selection for centralized tandem free voIP conferencing. 633-636 - Akitoshi Kataoka, Yusuke Hiwasaki, Toru Morinaga, Jotaro Ikedo:

Measuring the perceived importance of time- and frequency-divided speech blocks for transmitting over packet networks. 637-640 - Moo Young Kim, W. Bastiaan Kleijn

:
Comparison of transmitter - based packet-loss recovery techniques for voice transmission. 641-644
Acoustic Modeling
- Denis Jouvet, Ronaldo O. Messina:

Context dependent "long units" for speech recognition. 645-648 - Shinichi Yoshizawa, Kiyohiro Shikano:

Rapid EM training based on model-integration. 649-652 - Dominique Fohr, Odile Mella, Irina Illina, Christophe Cerisara:

Experiments on the accuracy of phone models and liaison processing in a French broadcast news transcription system. 653-656 - Jorge F. Silva, Shrikanth S. Narayanan:

A statistical discrimination measure for hidden Markov models based on divergence. 657-660 - Jan Stadermann, Gerhard Rigoll:

A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition. 661-664 - Dirk Knoblauch:

Data driven number-of-states selection in HMM topologies. 665-668 - Youngkyu Cho, Sung-a Kim, Dongsuk Yook:

Hybrid model using subspace distribution clustering hidden Markov models and semi-continuous hidden Markov models for embedded speech recognizers. 669-672 - Peder A. Olsen, Karthik Visweswariah:

Fast clustering of Gaussians and the virtue of representing Gaussians in exponential model format. 673-676 - Karen Livescu, James R. Glass:

Feature-based pronunciation modeling with trainable asynchrony probabilities. 677-680 - Hong-Kwang Jeff Kuo, Yuqing Gao:

Maximum entropy direct model as a unified model for acoustic modeling in speech recognition. 681-684 - Yu Zhu, Tan Lee:

Explicit duration modeling for Cantonese connected-digit recognition. 685-688 - Arthur Chan, Mosur Ravishankar, Alexander I. Rudnicky, Jahanzeb Sherwani:

Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems. 689-692 - Junho Park, Hanseok Ko:

Compact acoustic model for embedded implementation. 693-696 - Takatoshi Jitsuhiro, Satoshi Nakamura:

Increasing the mixture components of non-uniform HMM structures based on a variational Bayesian approach. 697-700 - Panu Somervuo:

Comparison of ML, MAP, and VB based acoustic models in large vocabulary speech recognition. 701-704 - Wolfgang Macherey, Ralf Schlüter, Hermann Ney:

Discriminative training with tied covariance matrices. 705-708 - Frank Diehl, Asunción Moreno:

Acoustic phonetic modeling using local codebook features. 709-712 - Gue Jun Jung, Su-Hyun Kim, Yung-Hwan Oh:

An efficient codebook design in SDCHMM for mobile communication environments. 713-716 - Makoto Shozakai, Goshu Nagino:

Analysis of speaking styles by two-dimensional visualization of aggregate of acoustic models. 717-720 - Myoung-Wan Koo, Ho-Hyun Jeon, Sang-Hong Lee:

Context dependent phoneme duration modeling with tree-based state tying. 721-724 - John Scott Bridle:

Towards better understanding of the model implied by the use of dynamic features in HMMs. 725-728
Prosody Modeling and Generation
- Jianfeng Li, Guoping Hu, Ren-Hua Wang:

Chinese prosody phrase break prediction based on maximum entropy model. 729-732 - Krothapalli Sreenivasa Rao, Bayya Yegnanarayana:

Intonation modeling for indian languages. 733-736 - Yu Zheng, Gary Geunbae Lee, Byeongchang Kim:

Using multiple linguistic features for Mandarin phrase break prediction in maximum-entropy classification framework. 737 - Ian Read, Stephen Cox:

Using part-of-speech for predicting phrase breaks. 741-744 - David Escudero Mancebo, Valentín Cardeñoso-Payo:

A proposal to quantitatively select the right intonation unit in data-driven intonation modeling. 745-748 - Jinfu Ni, Hisashi Kawai, Keikichi Hirose:

Formulating contextual tonal variations in Mandarin. 749-752 - Salma Mouline, Olivier Boëffard, Paul C. Bagshaw:

Automatic adaptation of the momel F0 stylisation algorithm to new corpora. 753-756 - Pablo Daniel Agüero, Klaus Wimmer, Antonio Bonafonte:

Joint extraction and prediction of fujisaki's intonation model parameters. 757-760 - Panagiotis Zervas, Nikos Fakotakis, George K. Kokkinakis, Georgios Kouroupetroglou, Gerasimos Xydas:

Evaluation of corpus based tone prediction in mismatched environments for greek tts synthesis. 761-764 - Ziyu Xiong, Juanwen Chen:

The duration of pitch transition phase and its relative factors. 765-768 - Yu Hu, Ren-Hua Wang, Lu Sun:

Polynomial regression model for duration prediction in Mandarin. 769-772 - Michelle Tooher, John G. McKenna:

Prediction of the glottal LF parameters using regression trees. 773-776 - Volker Dellwo, Bianca Aschenberner, Petra Wagner, Jana Dancovicova, Ingmar Steiner:

Bonntempo-corpus and bonntempo-tools: a database for the study of speech rhythm and rate. 777-780 - Wentao Gu, Keikichi Hirose, Hiroya Fujisaki:

Analysis of F0 contours of Cantonese utterances based on the command-response model. 781-784 - Marion Dohen, Hélène Loevenbruck:

Pre-focal rephrasing, focal enhancement and postfocal deaccentuation in French. 785-788 - Sridhar Krishna Nemala, Partha Pratim Talukdar, Kalika Bali, A. G. Ramakrishnan:

Duration modeling for hindi text-to-speech synthesis system. 789-792 - Nemala Sridhar Krishna, Hema A. Murthy:

A new prosodic phrasing model for indian language telugu. 793-796 - Oliver Jokisch, Michael Hofmann:

Evolutionary optimization of an adaptive prosody model. 797-800 - Gerasimos Xydas, Georgios Kouroupetroglou:

An intonation model for embedded devices based on natural F0 samples. 801-804 - Katerina Vesela, Nino Peterek, Eva Hajicová:

Prosodic characteristics of czech contrastive topic. 805-808
Multi-Sensor ASR
- Martin Graciarena, Federico Cesari, Horacio Franco, Gregory K. Myers, Cregg Cowan, Victor Abrash:

Combination of standard and throat microphones for robust speech recognition in highly noisy environments. 809-812 - Cenk Demiroglu, David V. Anderson:

Noise robust digit recognition using a glottal radar sensor for voicing detection. 813-816 - Dominik Raub, John W. McDonough, Matthias Wölfel:

A cepstral domain maximum likelihod beamformer for speech recognition. 817-820 - Naoya Mochiki, Tetsunori Kobayashi, Toshiyuki Sekiya, Tetsuji Ogawa:

Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot. 821-824 - Shigeki Sagayama, Okajima Takashi, Yutaka Kamamoto, Takuya Nishimoto:

Complex spectrum circle centroid for microphone-array-based noisy speech recognition. 825-828 - Larry P. Heck, Mark Z. Mao:

Automatic speech recognition of co-channel speech: integrated speaker and speech recognition approach. 829-832
Multi-Lingual Speech Processing
- José B. Mariño, Asunción Moreno, Albino Nogueiras:

A first experience on multilingual acoustic modeling of the languages spoken in morocco. 833-836 - Mónica Caballero, Asunción Moreno, Albino Nogueiras:

Data driven multidialectal phone set for Spanish dialects. 837-840 - Daniela Oria, Akos Vetek:

Multilingual e-mail text processing for speech synthesis. 841-844 - Harald Romsdorfer, Beat Pfister:

Multi-context rules for phonological processing in polyglot TTS synthesis. 845-848 - Leonardo Badino, Claudia Barolo, Silvia Quazza:

A general approach to TTS reading of mixed-language texts. 849-852 - Panayiotis G. Georgiou, Shrikanth S. Narayanan, Hooman Shirani Mehr:

Context dependent statistical augmentation of persian transcripts. 853-856
Speech Enhancement
- Cenk Demiroglu, David V. Anderson:

A soft decision MMSE amplitude estimator as a noise preprocessor to speech coder s using a glottal sensor. 857-860 - Rongqiang Hu, David V. Anderson:

Single acoustic-channel speech enhancement based on glottal correlation using non-acoustic sensor. 861-864 - Xianxian Zhang, John H. L. Hansen, Kathryn Hoberg Arehart, Jessica Rossi-Katz:

In-vehicle based speech processing for hearing impaired subjects. 865-868 - Sriram Srinivasan

, W. Bastiaan Kleijn
:
Speech enhancement using adaptive time-domain segmentation. 869-872 - Tomohiro Nakatani, Keisuke Kinoshita, Masato Miyoshi, Parham Zolfaghari:

Harmonicity based monaural speech dereverberation with time warping and F0 adaptive window. 873-876 - Marc Delcroix, Takafumi Hikichi, Masato Miyoshi:

Dereverberation of speech signals based on linear prediction. 877-880
Speech and Affect
- Nick Campbell:

Perception of affect in speech - towards an automatic processing of paralinguistic information in spoken conversation. 881-884 - Noël Chateau, Valérie Maffiolo, Christophe Blouin:

Analysis of emotional speech in voice mail messages: the influence of speakers' gender. 885-888 - Chul Min Lee, Serdar Yildirim, Murtaza Bulut, Abe Kazemzadeh, Carlos Busso, Zhigang Deng, Sungbok Lee, Shrikanth S. Narayanan:

Emotion recognition based on phoneme classes. 889-892 - Peter Robinson, Tal Sobol Shikler:

Visualizing dynamic features of expressions in speech. 893-896 - Aijun Li, Haibo Wang:

Friendly speech analysis and perception in standard Chinese. 897-900 - Ailbhe Ní Chasaide, Christer Gobl:

Decomposing linguistic and affective components of phonatory quality. 901-904 - Dan-Ning Jiang, Lian-Hong Cai:

Classifying emotion in Chinese speech by decomposing prosodic features. 1325-1328 - Chen Yu, Paul M. Aoki, Allison Woodruff:

Detecting user engagement in everyday conversations. 1329-1332 - Takashi X. Fujisawa, Norman D. Cook:

Identifying emotion in speech prosody using acoustical cues of harmony. 1333-1336 - Jianhua Tao:

Context based emotion detection from text input. 1337-1340 - Atsushi Iwai, Yoshikazu Yano, Shigeru Okuma:

Complex emotion recognition system for a specific user using SOM based on prosodic features. 1341-1344 - Hoon-Young Cho, Kaisheng Yao, Te-Won Lee:

Emotion verification for emotion detection and unknown emotion rejection. 1345-1348 - Keikichi Hirose:

Improvement in corpus-based generation of F0 contours using generation process model for emotional speech synthesis. 1349-1352
Speech Features
- Rajesh Mahanand Hegde, Hema A. Murthy, Venkata Ramana Rao Gadde:

Continuous speech recognition using joint features derived from the modified group delay function and MFCC. 905-908 - Hua Yu:

Phase-space representation of speech. 909-912 - Hema A. Murthy, Rajesh Mahanand Hegde, Venkata Ramana Rao Gadde:

The modified group delay feature: a new spectral representation of speech. 913-916 - Oh-Wook Kwon, Te-Won Lee:

ICA-based feature extraction for phoneme recognition. 917-920 - Qifeng Zhu, Barry Y. Chen, Nelson Morgan, Andreas Stolcke:

On using MLP features in LVCSR. 921-924 - Barry Y. Chen, Qifeng Zhu, Nelson Morgan:

Learning long-term temporal features in LVCSR using neural networks. 925-928 - T. V. Sreenivas, G. V. Kiran, A. G. Krishna:

Neural "spike rate spectrum" as a noise robust, speaker invariant feature for automatic speech recognition. 929-932 - Yoshihisa Nakatoh, Makoto Nishizaki, Shinichi Yoshizawa, Maki Yamada:

An adaptive MEL-LPC analysis for speech recognition. 933-936 - Kentaro Ishizuka, Noboru Miyazaki, Tomohiro Nakatani, Yasuhiro Minami:

Improvement in robustness of speech feature extraction method using sub-band based periodicity and aperiodicity decomposition. 937-940 - Carlos Toshinori Ishi:

A new acoustic measure for aspiration noise detection. 941-944 - Kris Demuynck, Oscar Garcia, Dirk Van Compernolle:

Synthesizing speech from speech recognition parameters. 945-948 - Marios Athineos, Hynek Hermansky, Daniel P. W. Ellis:

LP-TRAP: linear predictive temporal patterns. 949-952 - Xiang Li, Richard M. Stern:

Parallel feature generation based on maximizing normalized acoustic likelihood. 953-956 - Kun-Ching Wang:

An adaptive band-partitioning spectral entropy based speech detection in realistic noisy environments. 957-960 - Javier Ramírez, José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio:

Improved voice activity detection combining noise reduction and subband divergence measures. 961-964 - Kiyoung Park, Changkyu Choi, Jeongsu Kim:

Voice activity detection using global soft decision with mixture of Gaussian model. 965-968 - Thomas Kemp, Climent Nadeu, Yin Hay Lam, Josep Maria Sola i Caros:

Environmental robust features for speech detection. 969-972 - Kornel Laskowski, Qin Jin, Tanja Schultz:

Crosscorrelation-based multispeaker speech activity detection. 973-976 - Shang-nien Tsai:

Improved robustness of time-frequency principal components (TFPC) by synergy of methods in different domains. 977-980 - Li Deng, Yu Dong, Alex Acero:

A quantitative model for formant dynamics and contextually assimilated reduction in fluent speech. 981-984 - Gernot Kubin, Tuan Van Pham:

DWT-based classification of acoustic-phonetic classes and phonetic units. 985-988 - Yong-Choon Cho, Seungjin Choi:

Learning nonnegative features of spectro-temporal sounds for classification. 989-992
Language Modeling, Multimodal & Multilingual Speech Processing
- Sungyup Chung, Keikichi Hirose, Nobuaki Minematsu:

N-gram language modeling of Japanese using bunsetsu boundaries. 993-996 - Langzhou Chen, Lori Lamel, Jean-Luc Gauvain, Gilles Adda:

Dynamic language modeling for broadcast news. 997-1000 - Ren-Yuan Lyu, Dau-Cheng Lyu, Min-Siong Liang, Min-Hong Wang, Yuang-Chin Chiang, Chun-Nan Hsu:

A unified framework for large vocabulary speech recognition of mutually unintelligible Chinese "regionalects". 1001-1004 - Ielka van der Sluis, Emiel Krahmer:

The influence of target size and distance on the production of speech and gesture in multimodal referring expressions. 1005-1008 - Anurag Kumar Gupta, Tasos Anastasakos:

Dynamic time windows for multimodal input fusion. 1009-1012 - Raymond H. Lee, Anurag Kumar Gupta:

MICot : a tool for multimodal input data collection. 1013-1016 - Chakib Tadj, Hicham Djenidi, Madjid Haouani, Amar Ramdane-Cherif, Nicole Lévy:

Simulating multimodal applications. 1017-1020 - Jakob Schou Pedersen, Paul Dalsgaard, Børge Lindberg:

A multimodal communication aid for global aphasia patients. 1021-1024 - Hirofumi Yamamoto, Gen-ichiro Kikui, Yoshinori Sagisaka:

Mis-recognized utterance detection using hierarchical language model. 1025-1028 - Marko Moberg, Kimmo Pärssinen, Juha Iso-Sipilä:

Cross-lingual phoneme mapping for multilingual synthesis systems. 1029-1032 - Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno, Tsuyoshi Tasaki, Takeshi Yamaguchi:

Robot motion control using listener's back-channels and head gesture information. 1033-1036 - Sakriani Sakti, Arry Akhmad Arman, Satoshi Nakamura, Paulus Hutagaol:

Indonesian speech recognition for hearing and speaking impaired people. 1037-1040 - Mohsen A. Rashwan:

A two phase arabic language model for speech recognition and other language applications. 1041-1044 - Yuya Akita, Tatsuya Kawahara:

Language model adaptation based on PLSA of topics and speakers. 1045-1048 - Hans J. G. A. Dolfing, Pierce Gerard Buckley, David Horowitz:

Unified language modeling using finite-state transducers with first applications. 1049-1052 - Katsunobu Itou, Atsushi Fujii, Tomoyosi Akiba:

Effects of language modeling on speech-driven question answering. 1053-1056 - Abhinav Sethy, Shrikanth S. Narayanan, Bhuvana Ramabhadran:

Measuring convergence in language model estimation using relative entropy. 1057-1060
Detection and Classification in ASR
- Rongqing Huang, John H. L. Hansen:

High-level feature weighted GMM network for audio stream classification. 1061-1064 - Jindrich Zdánský, Petr David, Jan Nouza:

An improved preprocessor for the automatic transcription of broadcast news audio stream. 1065-1068 - Yih-Ru Wang, Chi-Han Huang:

Speaker-and-environment change detection in broadcast news using the common component GMM-based divergence measure. 1069-1072 - Tommi Lahti:

Beginning of utterance detection algorithm for low complexity ASR engines. 1073-1076 - Somsak Sukittanon, Arun C. Surendran, John C. Platt, Christopher J. C. Burges:

Convolutional networks for speech detection. 1077-1080 - Suryakanth V. Gangashetty, Chellu Chandra Sekhar, B. Yegnanarayana:

Detection of vowel on set points in continuous speech using autoassociative neural network models. 1081-1084
Speech Analysis
- Toshiki Tamiya, Tetsuya Shimamura:

Reconstruction filter design for bone-conducted speech. 1085-1088 - Pedro J. Quintana-Morales, Juan L. Navarro-Mesa:

Frequency warped ARMA analysis of the closed and the open phase of voiced speech. 1089-1192 - Boris Doval, Baris Bozkurt, Christophe d'Alessandro, Thierry Dutoit:

Zeros of z-transform (ZZT) decomposition of speech for source-tract separation. 1093-1096 - Li Deng, Roberto Togneri:

Use of neural network mapping and extended kalman filter to recover vocal tract resonances from the MFCC parameters of speech. 1097-1100 - Xiao Li, Jonathan Malkin, Jeff A. Bilmes:

Graphical model approach to pitch tracking. 1101-1104 - Bo Xu, Jianhua Tao, Yongguo Kang:

A new multicomponent AM-FM demodulation with predicting frequency boundaries and its application to formant estimation. 1105-1108 - Yves Laprie:

A concurrent curve strategy for formant tracking. 2405-2408 - Qin Yan, Esfandiar Zavarehei, Saeed Vaseghi, Dimitrios Rentzos:

A formant tracking LP model for speech processing. 2409-2412 - Hong You:

Application of long-term filtering to formant estimation. 2413-2416 - Baris Bozkurt, Thierry Dutoit, Boris Doval, Christophe d'Alessandro:

A method for glottal formant frequency estimation. 2417-2420 - Baris Bozkurt, Thierry Dutoit, Boris Doval, Christophe d'Alessandro:

Improved differential phase spectrum processing for formant tracking. 2421-2424 - Xu Shao, Ben P. Milner:

MAP prediction of pitch from MFCC vectors for speech reconstruction. 2425-2428 - An-Tze Yu, Hsiao-Chuan Wang:

New harmonicity measures for pitch estimation and voice activity detection. 2429-2432 - Takuya Nishimoto, Shigeki Sagayama, Hirokazu Kameoka:

Multi-pitch trajectory estimation of concurrent speech based on harmonic GMM and nonlinear kalman filtering. 2433-2436 - Attila Ferencz, Jeongsu Kim, Yong-Beom Lee, Jae-Won Lee:

Automatic pitch marking and reconstruction of glottal closure instants from noisy and deformed electro-glotto-graph signals. 2437-2440 - Federico Flego, Luca Armani, Maurizio Omologo:

On the use of a weighted autocorrelation based fundamental frequency estimation for a multidimensional speech input. 2441-2444 - Aarthi M. Reddy, Bhiksha Raj:

A minimum mean squared error estimator for single channel speaker separation. 2445-2448 - Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu:

Audio source separation from the mixture using empirical mode decomposition with independent subspace analysis. 2449-2452 - In-Jung Oh, Hyun-Yeol Chung, Jae-Won Cho, Ho-Youl Jung, Rémy Prost:

Audio watermarking in sub-band signals using multiple echo kernels. 2453-2456 - Jie Zhang, Zhenyang Wu:

A piecewise interpolation method based on log-least square error criterion for HRTF. 2457-2460 - R. Muralishankar, A. G. Ramakrishnan, Lakshmish N. Kaushik:

Time-scaling of speech using independent subspace analysis. 2465-2468 - Laurent Girin, Mohammad Firouzmand, Sylvain Marchand:

Long term modeling of phase trajectories within the speech sinusoidal model framework. 2469-2472 - Tina Soltani, Dave Hermann, Etienne Cornu, Hamid Sheikhzadeh, Robert L. Brennan:

An acoustic shock limiting algorithm using time and frequency domain speech features. 2473-2476 - Jong Won Shin, Joon-Hyuk Chang, Nam Soo Kim:

Speech probability distribution based on generalized gama distribution. 2477-2480 - Yanli Zheng, Mark Hasegawa-Johnson, Sarah Borys:

Stop consonant classification by dynamic formant trajectory. 2481-2484 - Yoshinori Shiga, Simon King:

Estimating detailed spectral envelopes using articulatory clustering. 2485-2488
Speech Production
- Olov Engwall:

From real-time MRI to 3d tongue movements. 1109-1112 - Mitsuhiro Nakamura:

Coarticulatory variability and directionality in [s, ..]: an EPG study. 1113-1116 - Yosuke Tanabe, Tokihiko Kaburagi:

Flow representation through the glottis having a polygonal boundary shape. 1117-1120 - Hannu Pulakka, Paavo Alku, Svante Granqvist, Stellan Hertegard, Hans Larsson, Anne-Maria Laukkanen, Per-Ake Lindestad, Erkki Vilkman:

Analysis of the voice source in different phonation types: simultaneous high-sped imaging of the vocal fold vibration and glottal inverse filtering. 1121-1124 - Peter Birkholz, Dietmar Jackèl:

Influence of temporal discretization schemes on formant frequencies and bandwidths in time domain simulations of the vocal tract system. 1125-1128 - Tomoki Toda, Alan W. Black, Keiichi Tokuda:

Acoustic-to-articulatory inversion mapping with Gaussian mixture model. 1129-1132
Audio-Visual Speech Processing
- Jinyoung Kim, Jeesun Kim, Chris Davis:

Audio-visual spoken language processing. 1133-1136 - Kaoru Sekiyama, Denis Burnham:

Issues in the development of auditory-visual speech perception: adults, infants, and children. 1137-1140 - Emiel Krahmer, Marc Swerts:

Signaling and detecting uncertainty in audiovisual speech by children and adults. 1141-1144 - Valérie Hazan, Anke Sennema, Andrew Faulkner:

Effect of intensive audiovisual perceptual training on the perception and production of the /l/-/r/ contrast for Japanese learners of English. 1145-1148 - Jean Vroomen, Sabine van Linden, Béatrice de Gelder, Paul Bertelson:

Visual recalibration of auditory speech versus selective speech adaptation: different build-up courses. 1149-1152 - Chris Davis, Jeesun Kim:

Of the top of the head: audio-visual speech perception from the nose up. 1153-1156 - J. Bruce Millar, Michael Wagner, Roland Goecke:

Aspects of speaking-face data corpus design methodology. 1157-1160 - Jean-Luc Schwartz, Marie-Agnès Cathiard:

Modeling audio-visual speech perception: back on fusion architectures and fusion control. 2017-2020 - Mikko Sams, Ville Ojanen, Jyrki Tuomainen, Vasily Klucharev:

Neurocognition of speech-specific audiovisual perception. 2021-2024 - Adriano Vilela Barbosa, Eric Vatikiotis-Bateson, Andreas Daffertshofer:

Target practice on talking faces. 2025-2028 - Matthias Odisio, Gérard Bailly:

Audiovisual perceptual evaluation of resynthesised speech movements. 2029-2032 - Sascha Fagel:

Video-realistic synthetic speech with a parametric visual speech synthesizer. 2033-2036 - Patricia Scanlon, Gerasimos Potamianos, Vit Libal, Stephen M. Chu:

Mutual information based visual feature selection for lipreading. 2037-2040 - Bowon Lee, Mark Hasegawa-Johnson, Camille Goudeseune, Suketu Kamdar, Sarah Borys, Ming Liu, Thomas S. Huang:

AVICAR: audio-visual speech corpus in a car environment. 2489-2492 - Engin Erzin, Yucel Yemez, A. Murat Tekalp:

Adaptive classifier cascade for multimodal speaker identification. 2493-2496 - Midori Iba, Anke Sennema, Valérie Hazan, Andrew Faulkner:

Use of visual cues in the perception of a labial/labiodental contrast by Spanish-L1 and Japanese-L1 learners of English. 2497-2500 - Xianxian Zhang, Kazuya Takeda, John H. L. Hansen, Toshiki Maeno:

Audio-visual SPeaker localization for car navigation systems. 2501-2504 - Josef Chaloupka:

Automatic lips reading for audio-visual speech processing and recognition. 2505-2508 - Michael Wagner, Girija Chetty:

"liveness" verification in audio-video authentication. 2509-2512 - Maria José Sanchez Martinez, Juan Pablo de la Cruz Gutiérrez

:
Speech recognition using motion based lipreading. 2513-2516 - Frédéric Berthommier:

Comparative study of linear and non-linear models for viseme in version: modeling of a cortical associative function. 2517-2520 - Petr Císar, Zdenek Krnoul, Milos Zelezný:

3d lip-tracking for audio-visual speech recognition in real applications. 2521-2524 - J. Bruce Millar, Roland Goecke:

The audio-video australian English speech data corpus AVOZES. 2525-2528 - Ki-Hyung Hong, Yong-Ju Lee, Jae-Young Suh, Kyong-Nim Lee:

Correcting Korean vowel speech recognition errors with limited lip features. 2529-2532 - Kuniko Y. Nielsen:

Segmental differences in the visual contribution to speech inteligibility. 2533-2536
Spoken Language Generation and Synthesis III
- Hui Ye, Steve J. Young:

Voice conversion for unknown speakers. 1161-1164 - Volker Fischer, Jaime Botella Ordinas, Siegfried Kunzmann:

Domain adaptation methods in the IBM trainable text-to-speech system. 1165-1168 - Yi Zhou, Yiqing Zu, Zhenli Yu, Dongjian Yue, Guilin Chen:

Applying pitch connection control in Mandarin speech synthesis. 1169-1172 - Hermann Ney, David Sündermann, Antonio Bonafonte, Harald Höge:

A first step towards text-independent voice conversion. 1173-1176 - Zhenli Yu, Kaizhi Wang, Yiqing Zu, Dongjian Yue, Guilin Chen:

Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems. 1177-1180 - Jithendra Vepa, Simon King:

Subjective evaluation of join cost functions used in unit selection speech synthesis. 1181-1184 - Heiga Zen, Tadashi Kitamura, Murtaza Bulut, Shrikanth S. Narayanan, Ryosuke Tsuzuki, Keiichi Tokuda:

Constructing emotional speech synthesizers with limited speech database. 1185-1188 - Cheng-Yuan Lin, Jyh-Shing Roger Jang:

A two-phase pitch marking method for TD-PSOLA synthesis. 1189-1192 - Antonio Bonafonte, Alexander Kain, Jan P. H. van Santen, Helenca Duxans:

Including dynamic and phonetic information in voice conversion systems. 1193-1196 - Zixiang Wang, Ren-Hua Wang, Zhiwei Shuang, Zhen-Hua Ling:

A novel voice conversion system based on codebook mapping with phoneme-tied weighting. 1197-1200 - Zhen-Hua Ling, Yu Hu, Zhiwei Shuang, Ren-Hua Wang:

Compression of speech database by feature separation and pattern clustering using STRAIGHT. 1201-1204 - Shunsuke Kataoka, Nobuaki Mizutani, Keiichi Tokuda, Tadashi Kitamura:

Decision-tree backing-off in HMM-based speech synthesis. 1205-1208 - Nobuyuki Nishizawa, Hisashi Kawai:

Using a depth-restricted search to reduce delays in unit selection. 1209-1212 - Junichi Yamagishi, Takashi Masuko, Takao Kobayashi:

MLLR adaptation for hidden semi-Markov model based speech synthesis. 1213-1216 - Stefan Breuer, Julia Abresch:

Phoxsy: multi-phone segments for unit selection speech synthesis. 1217-1220 - Francesc Alías, Xavier Llorà, Ignasi Iriondo Sanz, Joan Claudi Socoró, Xavier Sevillano, Lluís Formiga:

Perception-guided and phonetic clustering weight tuning based on diphone pairs for unit selection TTS. 1221-1224 - Taoufik En-Najjary, Olivier Rosec, Thierry Chonavel:

A voice conversion method based on joint pitch and spectral envelope transformation. 1225-1228 - Taoufik En-Najjary, Olivier Rosec, Thierry Chonavel:

Fast GMM-based voice conversion for text-to-speech synthesis systems. 1229-1232 - Rohit Kumar:

A genetic algorithm for unit selection based speech synthesis. 1233-1236 - Jun Huang, Lex Olorenshaw, Gustavo Hernández Ábrego, Lei Duan:

A memory efficient grapheme-to-phoneme conversion system for speech processing. 1237-1240 - Rohit Kumar, S. Prahallad Kishore:

Automatic pruning of unit selection speech databases for synthesis without loss of naturalness. 1377-1380 - Tanya Lambert, Andrew P. Breen:

A database design for a TTS synthesis system using lexical diphones. 1381-1384 - John Kominek, Alan W. Black:

A family-of-models approach to HMM-based segmentation for unit selection speech synthesis. 1385-1388 - Wei Zhang, Ling Jin, Xijun Ma:

Mutual-information based segment pre-selection in concatenative text-to-speech. 1389-1392 - Heiga Zen, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura:

Hidden semi-Markov model based speech synthesis. 1393-1396 - Hartmut R. Pfitzinger:

DFW-based spectral smoothing for concatenative speech synthesis. 1397-1400 - Kyung-Joong Min, Un-Cheon Lim:

Korean prosody generation and artificial neural networks. 1869-1872 - Kyuchul Yoon:

A prosodic phrasing model for a Korean text-to-speech synthesis system. 1873-1876 - Qin Shi, Volker Fischer:

A comparison of statistical methods and features for the prediction of prosodic structures. 1877-1880 - Gui-Lin Chen, Ke-Song Han:

Letter-to-sound for small-footprint multilingual TTS engine. 1881-1884 - Jun Xu, Guohong Fu, Haizhou Li:

Grapheme-to-phoneme conversion for Chinese text-to-speech. 1885-1888 - Marc Schröder, Stefan Breuer:

XML representation languages as a way of interconnecting TTS modules. 1889-1892 - Wenjie Cao, Chengqing Zong

, Bo Xu:
Approach to interchange-format based Chinese generation. 1893-1896 - Enrico Zovato, Stefano Sandri, Silvia Quazza, Leonardo Badino:

Prosodic analysis of a multi-style corpus in the perspective of emotional speech synthesis. 1897-1900 - Kyung-Joong Min, Chan-Goo Kang, Un-Cheon Lim:

Number of output nodes of artificial neural networks for Korean prosody generation. 1901-1904 - Sunhee Kim, Ju-Eun Ahn, Soon-Hyob Kim, Yang-Hee Lee:

A Korean grapheme-to-phoneme conversion system using selection procedure for exceptions. 1905-1908 - Thanate Khaorapapong, Montri Karnjanadecha, Keerati Inthavisas:

Synthesis of vowels and tones in Thai language by articulatory modeling. 1909-1912 - Yoshinori Shiga, Simon King:

Source-filter separation for articulation-to-speech synthesis. 1913-1916 - Hisako Asano, Hideharu Nakajima, Hideyuki Mizuno, Masahiro Oku:

Long vowel detection for letter-to-sound conversion for Japanese sourced words transliterated into the alphabet. 1917-1920 - Frantz Clermont, Thomas John Millhouse:

Inexactness and robustness in cepstral-to-formant transformation of spoken and sung vowels. 1921-1924 - Takeshi Saitou, Naoya Tsuji, Masashi Unoki, Masato Akagi:

Analysis of acoustic features affecting "singing-ness" and its application to singing-voice synthesis from speaking-voice. 1925-1928 - Vincent Pollet, Geert Coorman:

Statistical corpus-based speech segmentation. 1929-1932 - Jindrich Matousek, Jan Romportl, Daniel Tihelka, Zbynek Tychtl:

Recent improvements on ARTIC: czech text-to-speech system. 1933-1936 - Youngim Jung, Donghun Lee, HyeonSook Nam, Ae-sun Yoon, Hyuk-Chul Kwon:

Learning for transliteration of arabic-numeral expressions using decision tree for Korean TTS. 1937-1940 - Nicole Beringer:

How to integrate phonetic and linguistic knowledge in a text-to-phoneme conversion task: a syllabic TPC tool for French. 1941-1944 - Wael Hamza, Ellen Eide, Raimo Bakis:

Reconciling pronunciation differences between the front-end and the back-end in the IBM speech synthesis system. 2561-2564 - Juhong Ha, Yu Zheng, Gary Geunbae Lee, Yoon-Suk Seong, Byeongchang Kim:

High quality text-to-pinyin conversion using two-phase unknown word prediction. 2565-2568 - Yeon-Jun Kim, Ann K. Syrdal, Alistair Conkie:

Pronunciation lexicon adaptation for TTS voice building. 2569-2572 - Gabriel Webster:

Improving letter-to-pronunciation accuracy with automatic morphologically-based stress prediction. 2573-2576 - Wael Hamza, Ellen Eide, Raimo Bakis, Michael Picheny, John F. Pitrelli:

The IBM expressive speech synthesis system. 2577-2580 - Markus Schnell, Rüdiger Hoffmann:

What concept-to-speech can gain for prosody. 2581-2584
Speech Recognition - Language Model
- Tatsuya Kawahara, Kiyotaka Uchimoto, Hitoshi Isahara, Kazuya Shitaoka:

Dependency structure analysis and sentence boundary detection in spontaneous Japanese. 1353-1356 - Salma Jamoussi, David Langlois, Jean Paul Haton, Kamel Smaïli:

Statistical feature language model. 1357-1360 - Brigitte Bigi, Yan Huang, Renato de Mori:

Vocabulary and language model adaptation using information retrieval. 1361-1364 - Shinsuke Mori, Daisuke Takuma:

Word n-gram probability estimation from a Japanese raw corpus. 1365-1368 - Jen-Tzung Chien, Hung-Ying Chen:

Mining of association patterns for language modeling. 1369-1372 - Jen-Tzung Chien, Meng-Sung Wu, Hua-Jui Peng:

On latent semantic language modeling and smoothing. 1373-1376 - Vaibhava Goel:

Conditional maximum likelihood estimation for improving annotation performance of n-gram models incorporating stochastic finite state grammars. 2237-2241 - Edward James Schofield:

Fast parameter estimation for joint maximum entropy language models. 2241-2244 - Dimitra Vergyri, Katrin Kirchhoff, Kevin Duh, Andreas Stolcke:

Morphology-based language modeling for arabic speech recognition. 2245-2248 - A. Nayeemulla Khan, B. Yegnanarayana:

Speech enhanced multi-Span language model. 2249-2252 - Holger Schwenk, Jean-Luc Gauvain:

Neural network language models for conversational speech recognition. 2253-2256 - David Mrva, Philip C. Woodland:

A PLSA-based language model for conversational telephone speech. 2257-2260
Speaker Recognition
- Jérôme Louradour, Régine André-Obrecht, Khalid Daoudi:

Segmentation and relevance measure for speaker verification. 1401-1404 - Mohamed Chetouani, Bruno Gas, Jean-Luc Zarader, Marcos Faúndez-Zanuy:

A new nonlinear feature extraction algorithm for speaker verification. 1405-1408 - Elizabeth Shriberg, Luciana Ferrer, Anand Venkataraman, Sachin S. Kajarekar:

SVM modeling of "SNERF-grams" for speaker recognition. 1409-1412 - Purdy Ho, Pedro J. Moreno:

SVM kernel adaptation in speaker classification and verification. 1413-1416 - Koji Iwano, Taichi Asami, Sadaoki Furui:

Noise-robust speaker verification using F0 features. 1417-1420 - Zi-He Chen, Yuan-Fu Liao, Yau-Tarng Juang:

Eigen-prosody analysis for robust speaker recognition under mismatch handset environment. 1421 - Aaron D. Lawson, Mark C. Huggins:

Triphone-based confidence system for speaker identification. 1745-1748 - Kenichi Yoshida, Kazuyuki Takagi, Kazuhiko Ozeki:

Improved model training and automatic weight adjustment for multi-SNR multi-band speaker identification system. 1749-1752 - Man-Wai Mak, Kwok-Kwong Yiu, Ming-Cheung Cheung, Sun-Yuan Kung:

A new approach to channel robust speaker verification via constrained stochastic feature transformation. 1753-1756 - Chakib Tadj, Christian S. Gargour, Nabil Badri:

Best speaker-based structure tree for speaker verification. 1757-1760 - David Chow, Waleed H. Abdulla:

Robust speaker identification based on perceptual log area ratio and Gaussian mixture models. 1761-1764 - Stanley J. Wenndt, Richard M. Floyd:

Channel frequency response correction for speaker recognition. 1765-1768 - Yh-Her Yang, Yuan-Fu Liao:

Unseen handset mismatch compensation based on a priori knowledge interpolation for robust speaker recognition. 1769-1772 - Michael T. Padilla, Thomas F. Quatieri:

A comparison of soft and hard spectral subtraction for speaker verification. 1773-1776 - Vlasta Radová, Ales Padrta:

Comparison of several speaker verification procedures based on GMM. 1777-1780 - Yong Guan, Wenju Liu, Hongwei Qi, Jue Wang:

Improving performance of text-independent speaker identification by utilizing contextual principal curves filtering. 1781-1784 - Jen-Tzung Chien, Chuan-Wei Ting:

Speaker identification using probabilistic PCA model selection. 1785-1788 - Hagai Aronowitz, David Burshtein, Amihood Amir:

Text independent speaker recognition using speaker dependent word spotting. 1789-1792 - Hsiao-Chuan Wang, Jyh-Min Cheng:

A study on model-based equal error rate estimation for automatic speaker verification. 1793-1796 - Tomoko Matsui, Kunio Tanabe:

Probabilistic speaker identification with dual penalized logistic regression machine. 1797-1800 - Javier R. Saeta, Javier Hernando:

Model quality evaluation during enrolment for speaker verification. 1801-1804 - Pasi Fränti, Evgeny Karpov, Tomi Kinnunen:

Real-time speaker identification. 1805-1808 - Mohamed Fathy Abu-ElYazeed, Nemat S. Abdel Kader, Mohammed El-Henawy:

Multi-codebook vector quantization algorithm for speaker identification. 1809-1812 - Ming-Cheung Cheung, Kwok-Kwong Yiu, Man-Wai Mak, Sun-Yuan Kung:

Multi-sample fusion with constrained feature transformation for robust speaker verification. 1813-1816 - Michael Betser, Frédéric Bimbot, Mathieu Ben, Guillaume Gravier:

Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs. 2329-2332 - Nengheng Zheng, P. C. Ching, Tan Lee:

Time -frequency analysis of vocal source signal for speaker recognition. 2333-2336 - Rashmi Gangadharaiah, Balakrishnan Narayanaswamy, Narayanaswamy Balakrishnan:

A novel method for two-speaker segmentation. 2337-2340 - Bayya Yegnanarayana, A. Shahina, M. R. Kesheorey:

Throat microphone signal for speaker recognition. 2341-2344 - Mohamed Faouzi BenZeghiba, Hervé Bourlard:

Posteriori probabilities and likelihoods combination for speech and speaker recognition. 2345-2348 - Mohamed Mihoubi, Douglas D. O'Shaughnessy, Pierre Dumouchel:

The use of typical sequences for robust speaker identification. 2349-2352 - KyungHwa Kim:

A forensic phonetic investigation into the duration and speech rate. 2353-2356 - T. V. Sreenivas, Sameer Badaskar:

Mixture Gaussian model training against impostor model parameters: an application to speaker identification. 2357-2360 - Jan Anguita, Javier Hernando, Alberto Abad:

Jacobian adaptation with improved noise reference for speaker verification. 2361-2364 - Mihalis Siafarikas, Todor Ganchev, Nikos Fakotakis:

Objective wavelet packet features for speaker verification. 2365-2368 - Upendra V. Chaudhari, Ganesh N. Ramaswamy:

Policy analysis framework for conversational biometrics. 2369-2372 - Woo-Yong Choi, Jung Gon Kim, Hyung Soon Kim, Sung Bum Pan:

A new score normalization method for speaker verification with virtual impostor model. 2373-2376 - Samuel Kim, Thomas Eriksson, Hong-Goo Kang:

On the time variability of vocal tract for speaker recognition. 2377-2380 - Veena Desai, Hema A. Murthy:

Distributed speaker recognition. 2381-2384 - Pongtep Angkititrakul, Sepideh Baghaii, John H. L. Hansen:

Cluster-dependent modeling and confidence measure processing for in-set/out-of-set speaker identification. 2385-2388 - Yoshiyuki Umeda, Shingo Kuroiwa, Satoru Tsuge, Fuji Ren:

Distributed speaker recognition using earth mover's distance. 2389-2392 - Michael Barlow, Mehrdad Khodai-Joopari, Frantz Clermont:

A forensically-motivated tool for selecting cepstrally-consistent steady-states from non-contemporaneous vowel utterances. 2393-2396 - Anil Alexander, Andrzej Drygajlo:

Scoring and direct methods for the interpretation of evidence in forensic speaker recognition. 2397-2400 - Tomi Kinnunen, Evgeny Karpov, Pasi Fränti:

Efficient online cohort selection method for speaker verification. 2401-2404 - A. Nayeemulla Khan, Bayya Yegnanarayana:

Latent semantic analysis for speaker recognition. 2589-2592 - Yang Shao, DeLiang Wang:

Model-based sequential organization for cochannel speaker identification. 2593-2596 - Ka-Yee Leung, Man-Wai Mak, Sun-Yuan Kung:

Articulatory feature-based conditional pronunciation modeling for speaker verification. 2597-2600 - Alex Park, Timothy J. Hazen:

A comparison of normalization and training approaches for ASR-dependent speaker identification. 2601-2604 - Dat Tran:

New background modeling for speaker verification. 2605-2608
Processing of Prosody by Humans and Machines
- Gérard Bailly, Bleicke Holm, Véronique Aubergé:

A trainable prosodic model: learning the contours implementing communicative functions within a superpositional model of intonation. 1425-1428 - Dung Tien Nguyen, Chi Mai Luong, Bang Kim Vu, Hansjörg Mixdorff, Huy Hoang Ngo:

Fujisaki model based F0 contours in vietnamese TTS. 1429-1432 - Kazuyuki Ashimura, Hideki Kashioka, Nick Campbell:

Estimating speaking rate in spontaneous speech from z-scores of pattern durations. 1433-1436 - Takashi Masuko, Takao Kobayashi, Keisuke Miyanaga:

A style control technique for HMM-based speech synthesis. 1437-1440 - Mark Hasegawa-Johnson, Stephen E. Levinson, Tong Zhang:

Children's emotion recognition in an intelligent tutoring scenario. 1441-1444 - Keikichi Hirose, Nobuaki Minematsu:

Use of prosodic features for speech recognition. 1445-1448
Contemporary Issues in ASR
- Jochen Peters, Christina Drexel:

Transformation-based error correction for speech-to-text systems. 1449-1452 - Alexander Gutkin, Simon King:

Phone classification in pseudo-euclidean vector spaces. 1453-1456 - Grace Chung, Chao Wang, Stephanie Seneff, Edward Filisko, Min Tang:

Combining linguistic knowledge and acoustic information in automatic pronunciation lexicon generation. 1457-1460 - Ken Chen, Mark Hasegawa-Johnson:

Modeling pronunciation variation using artificial neural networks for English spontaneous speech. 1461-1464 - Stefanie Aalburg, Harald Höge:

Foreign-accented speaker-independent speech recognition. 1465-1468 - Panikos Heracleous, Yoshitaka Nakajima, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:

Non-audible murmur (NAM) speech recognition using a stethoscopic NAM microphone. 1469-1472 - Martin J. Russell, Shona D'Arcy, Lit Ping Wong:

Recognition of read and spontaneous children's speech using two new corpora. 1473-1476 - Joe Frankel, Mirjam Wester, Simon King:

Articulatory feature recognition using dynamic Bayesian networks. 1477-1480 - Gies Bouwman, Bert Cranen, Lou Boves:

Predicting word correct rate from acoustic and linguistic confusability. 1481-1484 - Kazushi Ishihara, Yuya Hattori, Tomohiro Nakatani, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:

Disambiguation in determining phonemes of sound-imitation words for environmental sound recognition. 1485-1488 - Jan Anguita, Stéphane Peillon, Javier Hernando, Alexandre Bramoulle:

Word confusability prediction in automatic speech recognition. 1489-1492 - Szu-Chen Stan Jou, Tanja Schultz, Alex Waibel:

Adaptation for soft whisper recognition using a throat microphone. 1493-1496 - Rainer Gruhn, Konstantin Markov, Satoshi Nakamura:

A statistical lexicon for non-native speech recognition. 1497-1500 - Mathew Magimai-Doss

, Shajith Ikbal, Todd A. Stephenson, Hervé Bourlard:
Modeling auxiliary features in tandem systems. 1501-1504 - Louis ten Bosch, Lou Boves:

Survey of spontaneous speech phenomena in a multimodal dialogue system and some implications for ASR. 1505-1508 - Tobias Cincarek, Rainer Gruhn, Satoshi Nakamura:

Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models. 1509-1512 - Frederik Stouten, Jean-Pierre Martens:

Coping with disfluencies in spontaneous speech recognition. 1513-1516 - Soonil Kwon, Shrikanth S. Narayanan:

Speaker model quantization for unsupervised speaker indexing. 1517-1520 - Matteo Gerosa, Diego Giuliani:

Investigating automatic recognition of non-native children's speech. 1521-1524 - Yang Liu, Elizabeth Shriberg, Andreas Stolcke, Mary P. Harper:

Using machine learning to cope with imbalanced classes in natural speech: evidence from sentence boundary and disfluency detection. 1525-1528 - Minho Jin, Gyucheol Jang, Sungrack Yun, Chang Dong Yoo:

Hybrid utterance verification based on n-best models and model derived from kulback-leibler divergence. 1529-1532 - Masataka Goto, Koji Kitayama, Katsunobu Itou, Tetsunori Kobayashi:

Speech spotter: on-demand speech recognition in human-human conversation on the telephone or in face-to-face situations. 1533-1536 - Kyong-Nim Lee, Minhwa Chung:

Pronunciation lexicon modeling and design for Korean large vocabulary continuous speech recognition. 1537-1540 - Sebastian Möller, Jan Felix Krebber, Alexander Raake:

Performance of speech recognition and synthesis in packet-based networks. 1541-1544 - Alastair Bruce James, Ben P. Milner, Angel Manuel Gomez:

A comparison of packet loss compensation methods and interleaving for speech recognition in burst-like packet loss. 1545-1548 - Ben P. Milner, Alastair Bruce James:

An analysis of packet loss models for distributed speech recognition. 1549-1552
Second Language Learning and Spoken Language Processing
- Nobuaki Minematsu:

Pronunciation assessment based upon the phonological distortions observed in language learners' utterances. 1669-1672 - Yasuo Suzuki, Yoshinori Sagisaka, Katsuhiko Shirai, Makiko Muto:

Analysis of the phone level contributions to objective evaluation of English speech by non-natives. 1673-1676 - Chao Wang, Mitchell Peabody, Stephanie Seneff, Jong-mi Kim:

An interactive English pronunciation dictionary for Korean learners. 1677-1680 - Seok-Chae Rhee, Jeon G. Park:

Development of the knowledge-based spoken English evaluation system and its application. 1681-1684 - Jared Bernstein, Isabella Barbier, Elizabeth Rosenfeld, John H. A. L. de Jong:

Theory and data in spoken language assessment. 1685-1688 - Tatsuya Kawahara, Masatake Dantsuji, Yasushi Tsubota:

Practical use of English pronunciation system for Japanese students in the CALL classroom. 1689-1692 - Jonas Beskow, Olov Engwall, Björn Granström, Preben Wik:

Design strategies for a virtual language tutor. 1693-1696
Emerging Research: Human Factors in Speech and Communication Systems
- Ellen Campana, Michael K. Tanenhaus, James F. Allen, Roger W. Remington:

Evaluating cognitive load in spoken language interfaces using a dual-task paradigm. 1721-1724 - Lesley-Ann Black, Norman D. Black, Roy Harper, Michelle Lemon, Michael F. McTear:

The voice-logbook: integrating human factors for a chronic care system. 1725-1728 - Kristiina Jokinen:

Communicative competence and adaptation in a spoken dialogue system. 1729-1732 - Zhan Fu, Lay Ling Pow, Fang Chen:

Evaluation of the difference between the driving behavior of a speech based and a speech-visual based task of an in-car compute. 1733-1736 - Sebastian Möller, Jan Felix Krebber, Paula M. T. Smeele:

Evaluating system metaphors via the speech output of a smart home system. 1737-1740 - Florian Hammer, Peter Reichl, Alexander Raake:

Elements of interactivity in telephone conversations. 1741-1744
Interdisciplinary Topics in Spoken Language Processing
- Rubén San Segundo, Juan Manuel Montero, Javier Macías Guarasa, Ricardo de Córdoba, Javier Ferreiros, José Manuel Pardo:

Generating gestures from speech. 1817-1820 - Noboru Kanedera, Asuka Sumida, Takao Ikehata, Tetsuo Funada:

Subtopic segmentation in the lecture speech. 1821-1824 - Donna Erickson, Caroline Menezes, Akinori Fujino:

Some articulatory measurements of real sadness. 1825-1828 - Chen-Long Lee, Wen-Whei Chang, Yuan-Chuan Chiang:

Application of voice conversion to hearing-impaired Mandarin speech enhancement. 1829-1832 - Oh Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino:

A Japanese dialogue-based CALL system with mispronunciation and grammar error detection. 1833-1836 - Cheolwoo Jo, Ilsuh Bak:

Statistics-based direction finding for training vowels. 1837-1840 - Simona Montanari, Serdar Yildirim, Elaine Andersen, Shrikanth S. Narayanan:

Reference marking in children's computer-directed speech: an integrated analysis of discourse and gestures. 1841-1844 - Jong-mi Kim, Suzanne Flynn:

What makes a non-native accent?: a study of Korean English. 1845-1848 - Sang-Jin Kim, Kwang-Ki Kim, Minsoo Hahn:

Study on emotional speech features in Korean with its aplication to voice color conversion. 1849-1852 - Shigeaki Amano, Tomohiro Nakatani, Tadahisa Kondo:

Developmental changes in voiced-segment ratio for Japanese infants and parents. 1853-1856 - Kisun You, Hoyoun Kim, Wonyong Sung:

Implementation of an intonational quality assessment system for a handheld device. 1857-1860 - Denis Beautemps, Thomas Burger, Laurent Girin:

Characterizing and classifying cued speech vowels from labial parameters. 1861-1864 - Shinya Takahashi, Tsuyoshi Morimoto, Sakashi Maeda, Naoyuki Tsuruta:

Cough detection in spoken dialogue system for home health care. 1865-1868
Towards Adaptive Machines: Active and Unsupervised Learning
- Dong Yu, Mei-Yuh Hwang, Peter Mau, Alex Acero, Li Deng:

Unsupervised learning from users' error correction in speech dictation. 1969-1972 - Gerard G. L. Meyer, Teresa M. Kamm:

Robustness aspects of active learning for acoustic modeling. 1973-1976 - Karthik Visweswariah, Ramesh A. Gopinath, Vaibhava Goel:

Task adaptation of acoustic and language models based on large quantities of data. 1977-1980 - Luc Lussier, Edward W. D. Whittaker, Sadaoki Furui:

Unsupervised language model adaptation methods for spontaneous speech. 1981-1984 - Masafumi Nishida, Yoshitaka Mamiya, Yasuo Horiuchi, Akira Ichikawa:

On-line incremental adaptation based on reinforcement learning for robust speech recognition. 1985-1988 - Tomohiro Watanabe, Hiromitsu Nishizaki, Takehito Utsuro, Seiichi Nakagawa:

Unsupervised speaker adaptation using high confidence portion recognition results by multiple recognition systems. 1989-1992
Speech Coding
- Sorin Dusan, James L. Flanagan, Amod Karve, Mridul Balaraman:

Speech coding using trajectory compression and multiple sensors. 1993-1996 - Christian Feldbauer, Gernot Kubin:

How sparse can we make the auditory representation of speech? 1997-2000 - David Malah, Slava Shechtman:

Efficient sub-optimal temporal decomposition with dynamic weighting of speech signals for coding applications. 2001-2004 - Teddy Surya Gunawan, Eliathamby Ambikairajah, Julien Epps:

Perceptual wavelet packet audio coder. 2005-2008 - Sung-Kyo Jung, Hong-Goo Kang, Dae Hee Youn, Chang-Heon Lee:

Performance analysis of transcoding algorithms in packet-loss environments. 2009-2012 - Tiago H. Falk, Wai-Yip Chan, Peter Kabal:

Speech quality estimation using Gaussian mixture models. 2013-2016
Robust ASR
- Hong Kook Kim, Mazin G. Rahim:

Why speech recognizers make errors ? a robustness view. 1645-1648 - Seyed Mohammad Ahadi, Hamid Sheikhzadeh, Robert L. Brennan, George H. Freeman:

An energy normalization scheme for improved robustness in speech recognition. 1649-1652 - Juan M. Huerta, Etienne Marcheret, Sreeram Balakrishnan:

Rapid on-line environment compensation for server - based speech recognition in noisy mobile environments. 1653-1656 - Leila Ansary, Seyyed Ali Seyyed Salehi:

Modeling phones coarticulation effects in a neural network based speech recognition system. 1657-1660 - Daniel Willett:

Error - weighted discriminative training for HMM parameter estimation. 1661-1664 - Wai Kit Lo, Frank K. Soong, Satoshi Nakamura:

Robust verification of recognized words in noise. 1665-1668 - Zili Li, Hesham Tolba, Douglas D. O'Shaughnessy:

Robust automatic speech recognition using an optimal spectral amplitude estimator algorithm in low-SNR car environments. 2041-2044 - Junhui Zhao, Jingming Kuang, Xiang Xie:

Robust speech recognition using data-driven temporal filters based on independent component analysis. 2045-2048 - Norihide Kitaoka, Longbiao Wang, Seiichi Nakagawa:

Robust distant speech recognition based on position dependent CMN. 2049-2052 - Sumitaka Sakauchi, Yoshikazu Yamaguchi, Satoshi Takahashi, Satoshi Kobashikawa:

Robust speech recognition based on HMM composition and modified wiener filter. 2053-2056 - Ivan Brito, Néstor Becerra Yoma, Carlos Molina:

Feature-dependent compensation in speech recognition. 2057-2060 - Stephen Cox:

Using context to correct phone recognition errors. 2061-2064 - Yasunari Obuchi:

Improved histogram-based feature compensation for robust speech recognition and unsupervised speaker adaptation. 2065-2068 - Zhenyu Xiong, Thomas Fang Zheng, Wenhu Wu:

Weighting observation vectors for robust speech recognition in noisy environments. 2069-2072 - Masanori Tsujikawa, Ken-ichi Iso:

Hands-free speech recognition using blind source separation post-processed by two-stage spectral subtraction. 2073-2076 - Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:

Robust speech recognition with spectral subtraction in low SNR. 2077-2080 - Bert Cranen, Johan de Veth:

Active perception: using a priori knowledge from clean speech models to ignore non-target features. 2081-2084 - Haitian Xu, Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg:

Spectral subtraction with full-wave rectification and likelihood controlled instantaneous noise estimation for robust speech recognition. 2085-2088 - Filip Korkmazsky, Dominique Fohr, Irina Illina:

Using linear interpolation to improve histogram equalization for speech recognition. 2089-2092 - Mark Hasegawa-Johnson, Ameya N. Deoras:

A factorial HMM aproach to robust isolated digit recognition in background music. 2093-2096 - Yoonjae Lee, Hanseok Ko:

Multi-eigenspace normalization for robust speech recognition in noisy environments. 2097-2100 - Christophe Cerisara, Dominique Fohr, Odile Mella, Irina Illina:

Exploiting models intrinsic robustness for noisy speech recognition. 2101-2104 - Pere Pujol, Jaume Padrell, Climent Nadeu, Dusan Macho:

Speech recognition experiments with the SPEECON database using several robust front-ends. 2105-2108 - Shajith Ikbal, Mathew Magimai-Doss

, Hemant Misra, Hervé Bourlard:
Spectro-temporal activity pattern (STAP) features for noise robust ASR. 2109-2112 - Byoung-Don Kim, Jin Young Kim, Seung Ho Choi, Young-Bum Lee, Kyoung-Rok Lee:

Improvement of confidence measure performance using background model set algorithm. 2113-2116 - Guillermo Aradilla, John Dines, Sunil Sivadas:

Using RASTA in task independent TANDEM feature extraction. 2117-2120 - Kyu Jeong Han, Shrikanth S. Narayanan, Naveen Srinivasamurthy:

A distributed speech recognition system in multi-user environments. 2121-2124 - Reinhold Haeb-Umbach, Valentin Ion:

Soft features for improved distributed speech recognition over wireless networks. 2125-2128
Emerging Research
- Rinzou Ebukuro:

Analysis on disappearing and thriving of speech applications for ergonomic design guidelines and recommendations. 2217-2220 - Paula M. T. Smeele, Sebastian Möller, Jan Felix Krebber:

Evaluation of the speech output of a smart-home system in a car environment. 2221-2225 - Ellen C. Haas:

How does the integration of speech recognition controls and spatialized auditory displays affect user workload? 2225-2228 - Fang Chen:

Speech interaction system - how to increase its usability? 2229-2232 - Nicole Beringer:

Human language acquisition methods in a machine learning task. 2233-2236
Spoken Language Resources and Technology Evaluation I
- Laila Dybkjær, Niels Ole Bernsen, Wolfgang Minker:

New challenges in usability evaluation - beyond task-oriented spoken dialogue systems. 2261-2264 - Owen Kimball, Chia-Lin Kao, Rukmini Iyer, Teodoro Arvizo, John Makhoul:

Using quick transcriptions to improve conversational speech models. 2265-2268 - Rohit Mishra, Elizabeth Shriberg, Sandra Upson, Joyce Chen, Fuliang Weng, Stanley Peters, Lawrence Cavedon, John Niekrasz, Hua Cheng, Harry Bratt:

A wizard of oz framework for collecting spoken human-computer dialogs. 2269-2272 - Mikko Hartikainen, Esa-Pekka Salonen, Markku Turunen:

Subjective evaluation of spoken dialogue systems using SER VQUAL method. 2273-2276 - Ioana Vasilescu, Laurence Devillers, Chloé Clavel, Thibaut Ehrette:

Fiction database for emotion detection in abnormal situations. 2277-2280 - Ruhi Sarikaya, Yuqing Gao, Paola Virga:

Fast semi-automatic semantic annotation for spoken dialog systems. 2281-2284 - Yi-Jian Wu, Hisashi Kawai, Jinfu Ni, Ren-Hua Wang:

A study on automatic detection of Japanese vowel devoicing for speech synthesis. 2721-2724 - Tolga Çiloglu, Dinc Acar, Ahmet Tokatli:

Orientel-turkish: telephone speech database description and notes on the experience. 2725-2728 - Taejin Yoon, Sandra Chavarria, Jennifer Cole, Mark Hasegawa-Johnson:

Intertranscriber reliability of prosodic labeling on telephone conversation using toBI. 2729-2732 - Jilei Tian:

Efficient compression method for pronunciation dictionaries. 2733-2736 - Min-Siong Liang, Dau-Cheng Lyu, Yuang-Chin Chiang, Ren-Yuan Lyu:

Construct a multi-lingual speech corpus in taiwan with extracting phonetically balanced articles. 2737-2740 - Per Olav Heggtveit, Jon Emil Natvig:

Automatic prosody labeling of read norwegian. 2741-2744 - Eric Sanders, Andrea Diersen, Willy Jongenburger, Helmer Strik:

Towards automatic word segmentation of dialect speech. 2745-2748 - Petr Fousek, Frantisek Grézl, Hynek Hermansky, Petr Svojanovsky:

New nonsense syllables database - analyses and preliminary ASR experiments. 2749-2752 - Jan Felix Krebber, Sebastian Möller, Alexander Raake:

Speech input and output module assessment for remote access to a smart-home spoken dialog system. 2753-2756 - Dong-Hyun Kim, Yong-Wan Roh, Kwang-Seok Hong:

An implement of speech DB gathering system using voiceXML. 2757-2760 - Farshad Almasganj:

Precise phone boundary detection using wavelet packet and recurrent neural networks. 2761-2764 - Andrew Cameron Morris, Viktoria Maier, Phil D. Green:

From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. 2765-2768 - Seok-Chae Rhee, Sook-Hyang Lee, Young-Ju Lee, Seok-Keun Kang:

Design and construction of Korean-spoken English corpus. 2769-2772 - Folkert de Vriend, Giulio Maltese:

Exploring XML-based technologies and procedures for quality evaluation from a real-life case perspective. 2773-2776 - Kuansan Wang:

Spoken language interface in ECMA/ISO telecommunication standards. 2777-2780 - Marelie H. Davel, Etienne Barnard:

The efficient generation of pronunciation dictionaries: machine learning factors during bootstrapping. 2781-2784 - Anja Geumann:

Towards a new level of anotation detail of multilingual speech corpora. 2785-2788 - Nobuo Kawaguchi, Shigeki Matsubara, Yukiko Yamaguchi, Kazuya Takeda, Fumitada Itakura:

CIAIR in-car speech database. 2789-2792 - Christophe Van Bael, Henk van den Heuvel, Helmer Strik:

Investigating speech style specific pronunciation variation in large spoken language corpora. 2793-2796 - Marelie H. Davel, Etienne Barnard:

The efficient generation of pronunciation dictionaries: human factors during bootstrapping. 2797-2800
Multi-Modal / Multi-Media Processing
- Roger K. Moore:

Modeling data entry rates for ASR and alternative input methods. 2285-2288 - Hiromitsu Ban, Chiyomi Miyajima, Katsunobu Itou, Fumitada Itakura, Kazuya Takeda:

Speech recognition using synchronization between speech and finger tapping. 2289-2292 - Anurag Kumar Gupta, Tasos Anastasakos:

Integration patterns during multimodal interaction. 2293-2296 - Etienne Marcheret, Stephen M. Chu, Vaibhava Goel, Gerasimos Potamianos:

Efficient likelihood computation in multi-stream HMM based audio-visual speech recognition. 2297-2300 - Changkyu Choi, Donggeon Kong, Hyoung-Ki Lee, Sang Min Yoon:

Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beam-forming. 2301-2304 - Tokitomo Ariyoshi, Kazuhiro Nakadai, Hiroshi Tsujino:

Multimodal expression for humanoid robots by integration of human speech mimicking and facial color. 2305-2308
Automatic Speech Recognition in the Context of Mobile Communications
- Miroslav Novak:

Towards large vocabulary ASR on embedded platforms. 2309-2312 - Hiroshi Fujimura, Katsunobu Itou, Kazuya Takeda, Fumitada Itakura:

Analysis of in-car speech recognition experiments using a large-scale multi-mode dialogue corpus. 2313-2316 - Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg:

On the integration of speech recognition into personal networks. 2317-2320 - Richard C. Rose, Hong Kook Kim:

Robust speech recognition in client-server scenarios. 2321-2324 - Sangbae Jeong, Icksang Han, Eugene Jon, Jeongsu Kim:

Memory and computation reduction for embedded ASR systems. 2325-2328
Robust Features for ASR
- Takashi Fukuda, Tsuneo Nitta:

Canonicalization of feature parameters for automatic speech recognition. 2537-2540 - Soundararajan Srinivasan, Nicoleta Roman, DeLiang Wang:

On binary and ratio time-frequency masks for robust speech recognition. 2541-2544 - Alberto Sanchís, Alfons Juan, Enrique Vidal:

New features based on multiple word graphs for utterance verification. 2545-2548 - Lukás Burget:

Combination of speech features using smoothed heteroscedastic linear discriminant analysis. 2549-2552 - Shajith Ikbal, Hemant Misra

, Sunil Sivadas, Hynek Hermansky, Hervé Bourlard:
Entropy based combination of tandem representations for noise robust ASR. 2553-2556 - Dongsuk Yook, Donghyun Kim:

Fast speech adaptation in linear spectral domain for additive and convolutional noise. 2557-2560
Towards Rapid Speech and Natural Language Application Development: Tooling, Architectures, Components and Standards
- I. Lee Hetherington:

The MIT finite-state transducer toolkit for speech and language processing. 2609-2612 - Junlan Feng, Srinivas Bangalore, Mazin G. Rahim:

Question-answering in webtalk: an evaluation study. 2613-2616 - Juan M. Huerta, Chaitanya Ekanadham:

Automatic network optimization of voice applications. 2617-2620 - Miguel Angel Rodriguez-Moreno, Heriberto Cuayáhuitl, Juventino Montiel-Hernández:

Voicebuilder: a framework for automatic speech application development. 2621-2624 - Andrea Facco, Daniele Falavigna, Roberto Gretter, Marcello Viganò:

On the development of telephone applications: some practical issues and evaluation. 2625-2628 - Stefan W. Hamerich, Volker Schless, Basilis Kladis, Volker Schubert, Otilia Kocsis, Stefan Igel, Ricardo de Córdoba, Luis Fernando D'Haro, José Manuel Pardo:

The GEMINI platform: semi-automatic generation of dialogue applications. 2629-2632
Speech Coding and Enhancement
- Kazuhiro Kondo, Kiyoshi Nakagawa:

A packet loss concealment method using recursive linear prediction. 2633-2636 - Minkyu Lee, Imed Zitouni, Qiru Zhou:

On a n-gram model approach for packet loss concealment. 2637-2640 - Stephen So, Kuldip K. Paliwal:

Efficient vector quantisation of line spectral frequencies using the switched split vector quantiser. 2641-2644 - M. Chaitanya, S. R. Mahadeva Prasanna, B. Yegnanarayana:

Enhancement of reverberant speech using excitation source information. 2645-2648 - Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi:

Improving automatic speech recognition performance and speech inteligibility with harmonicity based dereverberation. 2649-2652 - Seung Yeol Lee, Nam Soo Kim, Joon-Hyuk Chang:

Inner product based-multiband vector quantization for wideband speech coding at 16 kbps. 2653-2656 - Alberto Abad, Javier Hernando:

Speech enhancement and recognition by integrating adaptive beamforming and wiener filtering. 2657-2660 - Kyung-Tae Kim, Sung-Kyo Jung, MiSuk Lee, Hong-Goo Kang, Dae Hee Youn:

Temporal normalization techniques for transform-type speech coding and application to split-band wideband coders. 2661-2664 - Tatsunori Asai, Shigeki Miyabe, Hiroshi Saruwatari, Kiyohiro Shikano:

Interface for barge-in free spoken dialogue system using adaptive sound field control. 2665-2668 - Jong-Hark Kim, Jae-Hyun Shin, InSung Lee:

Multi-mode harmonic transfrom excitation LPC coding for speech and music. 2669-2672 - Mital Gandhi, Mark Hasegawa-Johnson:

Source separation using particle filters. 2673-2676 - Anssi Rämö, Jani Nurminen, Sakari Himanen, Ari Heikkinen:

Segmental speech coding model for storage applications. 2677-2680 - Gwo-hwa Ju, Lin-Shan Lee:

Improved speech enhancement by applying time-shift property of DFT on hankel matrices for signal subspace decomposition. 2681-2684 - Jari Juhani Turunen, Juha T. Tanttu, Frank Cameron:

Minimum phase compensation in speech coding using hammerstein model. 2685-2688 - Weifeng Li, Fumitada Itakura, Kazuya Takeda:

Optimizing regression for in-car speech recognition using multiple distributed microphones. 2689-2692 - Weifeng Li, Kazuya Takeda, Fumitada Itakura, Tran Huy Dat:

Speech enhancement based on magnitude estimation using the gamma prior. 2693-2696 - Andrew Errity, John McKenna, Stephen Isard:

Unscented kalman filtering of line spectral frequencies. 2697-2700 - Hyoung-Gook Kim, Thomas Sikora:

Speech enhancement based on smoothing of spectral noise floor. 2701-2704 - Junfeng Li, Masato Akagi:

Noise reduction using hybrid noise estimation technique and post-filtering. 2705-2708 - Marcel Gabrea:

An adaptive kalman filter for the enhancement of speech signals. 2709-2712 - T. V. Sreenivas, K. Sharath Rao, A. Sreenivasa Murthy:

Improved iterative wiener filtering for non-stationary noise speech enhancement. 2713-2716 - Yasheng Qian, Peter Kabal:

Highband spectrum envelope estimation of telephone speech using hard/soft-classification. 2717-2720
Acoustic Modeling for Robust ASR
- Filip Korkmazsky, Murat Deviren, Dominique Fohr, Irina Illina:

Hidden factor dynamic Bayesian networks for speech recognition. 2801-2804 - Mark Z. Mao, Vincent Vanhoucke:

Design of compact acoustic models through clustering of tied-covariance Gaussians. 2805-2808 - Chandra Kant Raut, Takuya Nishimoto, Shigeki Sagayama:

Model composition by lagrange polynomial approximation for robust speech recognition in noisy environment. 2809-2812 - Jian Wu, Donglai Zhu, Qiang Huo:

A study of minimum classification error training for segmental switching linear Gaussian hidden Markov models. 2813-2816 - Shigeki Matsuda, Takatoshi Jitsuhiro, Konstantin Markov, Satoshi Nakamura:

Speech recognition system robust to noise and speaking styles. 2817-2820 - Néstor Becerra Yoma, Ivan Brito, Carlos Molina:

The stochastic weighted viterbi algorithm: a frame work to compensate additive noise and low-bit rate coding distortion. 2821-2824
Spoken Dialogue Technology and Systems
- Stefanie Tomko, Roni Rosenfeld

:
Shaping spoken input in user-initiative systems. 2825-2828 - Christopher J. Pavlovski, Jennifer C. Lai, Stella Mitchell:

Etiology of user experience with natural language speech. 2829-2832 - Manny Rayner, Beth Ann Hockey:

Side effect free dialogue management in a voice enabled procedure browser. 2833-2836 - Ian Richard Lane, Tatsuya Kawahara, Shinichi Ueno:

Example-based training of dialogue planning incorporating user and situation models. 2837-2840 - Shinya Fujie, Tetsunori Kobayashi, Daizo Yagi, Hideaki Kikuchi:

Prosody based attitude recognition with feature selection and its application to spoken dialog system as para-linguistic information. 2841-2844 - David Ollason, Yun-Cheng Ju, Siddharth Bhatia, Daniel Herron, Jackie Liu:

MS connect: a fully featured auto-attendant: system design, implementation and performance. 2845-2848
Multi-Channel Speech Processing
- Reinhold Haeb-Umbach, Sven Peschke, Ernst Warsitz:

Adaptive beamforming combined with particle filtering for acoustic source localization. 2849-2852 - Hong-Seok Kwon, Siho Kim, Keun-Sung Bae:

Time delay estimation using weighted CPSP function. 2853-2856 - Ilyas Potamitis, Panagiotis Zervas, Nikos Fakotakis:

DOA estimation of speech signals using semi-blind source separation techniques. 2857-2860 - Sang-Gyun Kim, Chang D. Yoo:

Blind separation of speech and sub-Gaussian signals in underdetermined case. 2861-2864 - Gil-Jin Jang, Changkyu Choi, Yongbeom Lee, Yung-Hwan Oh:

Adaptive cross-channel interference cancellation on blind signal separation outputs using source absence/presence detection and spectral subtraction. 2865-2868 - Erik M. Visser, Kwokleung Chan, Stanley Kim, Te-Won Lee:

A comparison of simultaneous 3-channel blind source separation to selective separation on channel pairs using 2-channel BSS. 2869-2872
Intersection of Spoken Language Processing and Written Language Processing
- Hyun-Bok Lee:

Towards a harmonious coexistence of spoken and written language. 2873-2876 - Miyoko Sugito:

Towards a grammar of spoken language - prosody of ill-formed utterances and listener's understanding in discourse -. 2877-2880 - Tatsuya Kawahara, Kazuya Shitaoka, Hiroaki Nanjo:

Automatic transformation of lecture transcription into document style using statistical framework. 2881-2884 - Karunesh Arora, Sunita Arora, Kapil Verma, Shyam Sunder Agrawal:

Automatic extraction of phonetically rich sentences from large text corpus of indian languages. 2885-2888 - Nicoletta Calzolari:

European initiatives to promote cooperation between speech and text communities. 2889-2892
Prosodic Recognition and Analysis
- Keiichi Takamaru:

Evaluation of a threshold for detecting local slower phrases in Japanese spontaneous conversational speech. 2969-2972 - Nazrul Effendy, Ekkarit Maneenoi, Patavee Charnvivit, Somchai Jitapunkul:

Intonation recognition for indonesian speech based on fujisaki model. 2973-2976 - Jinsong Zhang, Satoshi Nakamura, Keikichi Hirose:

Efficient tone classification of speaker independent continuous Chinese speech using anchoring based discriminating features. 2977-2980 - Michiko Watanabe, Yasuharu Den, Keikichi Hirose, Nobuaki Minematsu:

Clause types and filed pauses in Japanese spontaneous monologues. 2981-2984 - Yohei Yabuta, Yasuhiro Katagiri, Noriko Suzuki, Yugo Takeuchi:

Effect of voice prosody on the decision making process in human-computer interaction. 2985-2988 - Noriko Suzuki, Yasuhiro Katagiri:

Alignment of human prosodic patterns for spoken dialogue systems. 2989-2992 - Shinya Kiriyama, Shigeyoshi Kitazawa:

Evaluation of a prosodic labeling system utilizing linguistic information. 2993-2996 - Allison Blodgett:

Functions of intonation boundaries during spoken language comprehension in English. 2997-3000 - Marco Khne, Matthias Wolff, Matthias Eichner, Rüdiger Hoffmann:

Voice activation using prosodic features. 3001-3004 - Sahyang Kim:

The role of prosodic cues in word segmentation of Korean. 3005-3008 - Sun-Ah Jun:

Default phrasing and attachment preference in Korean. 3009-3012 - Sarah Borys, Aaron Cohen, Mark Hasegawa-Johnson, Jennifer Cole:

Modeling and recognition of phonetic and prosodic factors for improvements to acoustic speech recognition models. 3013-3016 - Eunjong Kong:

The role of pitch range variation in the discourse structure and intonation structure of Korean. 3017-3020 - Kazuyuki Takagi, Kazuhiko Ozeki:

Dependency analysis of read Japanese sentences using pause and F0 information: a speaker independent case. 3021-3024 - Shari R. Speer, Soyoung Kang:

Effects of prosodic boundaries on ambiguous syntactic clause boundaries in Japanese. 3025-3028 - Yasuko Nagasaki, Takanori Komatsu:

The superior effectivenes of the F0 range for identifying the context from sounds without phonemes. 3029-3032 - Tan Li, Montri Karnjanadecha, Thanate Khaorapapong:

A study of tone classification for continuous Thai speech recognition. 3033-3036 - Key-Seop Kim, Un Lim, Dong-Il Shin:

An acoustic-analytic role for the deviation between the scansion and reading of poems. 3037-3040 - Tomoko Ohsuga, Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:

Estimating syntactic structure from prosodic features in Japanese speech. 3041-3044 - Masahiko Komatsu, Tsutomu Sugawara, Takayuki Arai:

Perceptual discrimination of prosodic types and their preliminary acoustic analysis. 3045-3048
Towards Rapid Speech and Natural Language Application Development
- Johann L'Hour, Olivier Boëffard, Jacques Siroux, Laurent Miclet, Francis Charpentier, Thierry Moudenc:

DORIS, a multiagent/IP platform for multimodal dialogue applications. 3049-3052 - Yu Chen:

EVITA-RAD: an extensible enterprise voice porTAI - rapid application development tool. 3053-3056 - Luis Fernando D'Haro, Ricardo de Córdoba, Rubén San Segundo, Juan Manuel Montero, Javier Macías Guarasa, José Manuel Pardo:

Strategies to reduce design time in multimodal/multilingual dialog applications. 3057-3060 - Gregory Aist:

Three-way system-user-expert interactions help you expand the capabilities of an existing spoken dialogue system. 3061-3064 - Giuseppe Di Fabbrizio, Charles Lewis:

Florence: a dialogue manager framework for spoken dialogue systems. 3065-3068 - Tatsuya Kawahara, Akinobu Lee, Kazuya Takeda, Katsunobu Itou, Kiyohiro Shikano:

Recent progress of open-source LVCSR engine julius and Japanese model repository. 3069-3072 - Hiroya Murao, Nobuo Kawaguchi, Shigeki Matsubara, Yukiko Yamaguchi, Kazuya Takeda, Yasuyoshi Inagaki:

Example-based spoken dialogue system with online example augmentation. 3073-3076 - Dirk Bhler:

Enhancing existing form-based dialogue managers with reasoning capabilities. 3077-3080 - Markku Turunen, Esa-Pekka Salonen, Mikko Hartikainen, Jaakko Hakulinen:

Robust and adaptive architecture for multilingual spoken dialogue systems. 3081-3084 - Porfírio P. Filipe, Nuno J. Mamede:

Towards ubiquitous task management. 3085-3088

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














