


default search action
28th TSD 2025: Erlangen, Germany - Part I
- Kamil Ekstein

, Miloslav Konopík
, Ondrej Prazák
, Frantisek Pártl
:
Text, Speech, and Dialogue - 28th International Conference, TSD 2025, Erlangen, Germany, August 25-28, 2025, Proceedings, Part I. Lecture Notes in Computer Science 16029, Springer 2026, ISBN 978-3-032-02547-0
Speech
- Ales Prazák

, Marie Kunesová
, Josef Psutka
:
Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR. 3-12 - Yanis Labrak, Richard Dufour, Mickaël Rouvier:

An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training. 13-24 - Janine Rugayan

, Giampiero Salvi, Torbjørn Svendsen:
Optimizing ASR Models with Semantic Information. 25-35 - Lukás Mateju, Jan Nouza, Martin Polácek, Petr Cerva:

Efficient Enhancement of Norwegian ASR Model. 36-47 - Keara Schaaij, Roel Boumans

, Tibor Bosse
, Iris Hendrickx
:
Towards Stable and Personalised Profiles for Lexical Alignment in Spoken Human-Agent Dialogue. 48-59 - Daiqi Liu, Tomás Arias-Vergara, Jana Hutter, Andreas Maier, Paula Andrea Pérez-Toro:

Audio-Vision Contrastive Learning for Phonological Class Recognition. 60-71 - Yassin Terraf, Youssef Iraqi:

TOSD-Net: A CNN-Transformer Architecture for Robust Frame-Level Overlapping Speech Detection in Diverse Acoustic Conditions. 72-83 - Marie Kunesová

, Zdenek Hanzlícek
, Jindrich Matousek
:
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-Shot Multi-speaker TTS. 84-95 - Juan Camilo Vásquez-Correa, Ander González-Docasal, Aritz Lasarguren, Jone López, Aitor Álvarez, Egoitz Rodriguez-Olea:

Emotion-Aware Speech-Driven Facial Avatar Animation via Joint Blendshape Prediction and Emotion Recognition. 96-108 - Yuxuan Zhang, Hippolyte Fournier, Ruslan Kalitvianski

, Marco Dinarelli, Fabien Ringeval:
Beyond Static Emotions: Leveraging Multitask Learning to Model Dynamics of Dimensional Affect in Speech. 109-120 - Felix Herron

, Solange Rossato, Alexandre Allauzen, Benoît Favre, François Portet:
Speaker Group Encoding in Self-supervised Speech Recognition Models. 121-132 - Christopher Simic

, Korbinian Riedhammer
, Tobias Bocklet
:
Combining Temporal Visual Dynamics and Audio Representations for Robust Speaker Identification. 133-145 - Daniel Tihelka

, Jindrich Matousek
, Lukás Vladar:
Sentences vs Phrases in Neural Speech Synthesis: The Phrases Strike Back. 146-157 - Lukás Vladar, Jindrich Matousek

, Jan Lehecka
, Markéta Rezácková:
Evaluating Phoneme-Level Pretraining in Czech Text-to-Speech Synthesis. 158-169 - Iuliia Thorbecke, Esaú Villatoro-Tello, Juan Pablo Zuluaga-Gomez, Shashi Kumar, Sergio Burdisso, Pradeep Rangappa, Andrés Carofilis, Srikanth R. Madikeri, Petr Motlícek, Karthik Pandia, Kadri Hacioglu, Andreas Stolcke:

Unifying Global and Near-Context Biasing in a Single Trie Pass. 170-181 - Kesego Mokgosi, Milad Dadgar, Cathy Ennis, Robert Ross:

Synthesising Cross-Speaker Data for Low-Resource Pathological Speech Recognition with PEFT. 182-193 - Felix Haas, Sebastian P. Bayerl

:
Multilingual Stutter Event Detection for English, German, and Mandarin Speech. 194-206 - Dalai Mengke, Yan Meng, Péter Mihajlik:

How Far Can Synthetic Speech Go? Enhancing ASR in Low-Resource Scenarios via Voice Cloning. 207-217 - Andreas Rouvalis, Johannes Tröger, Julius Steuer, Juan Rafael Orozco-Arroyave, Jan Rusz, Jouni Pohjalainen, Hali Lindsay, Bernd Möbius, Dietrich Klakow:

Enhancing Detection of Parkinson-Induced Dysarthria with Cross-Lingual Transfer Learning. 218-234 - Dominik Wagner, Ilja Baumann, Tobias Bocklet:

Vocoder-Free Non-parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks. 235-246 - Jan Tupý, Jan Svec, Lubos Smídl:

Detection of Cognitive Disorders Using ASR-Based Nonsense Words Repetition. 247-258 - Duygu Altinok

:
Mind the Gap: Entity-Preserved Context-Aware ASR for Structured Transcriptions. 259-274 - Duygu Altinok

:
Boosting CTC-Based ASR Using LLM-Based Intermediate Loss Regularization. 275-288 - Yan Meng, Péter Mihajlik, Dalai Mengke, Katalin Mády:

Robust Disfluency Labeling in Spontaneous Speech: Insights from Diverse Hungarian Corpora Including Atypical Speakers. 289-298 - Vladislav Stankov

, Matyás Kopp
, Ondrej Bojar
:
ParCzech4Speech: A New Speech Corpus Derived from Czech Parliamentary Data. 299-308 - Mykhailo Danilevskyi, Fernando Pérez-Téllez, Jelena Vasic:

Towards an Accurate Domain-Specific ASR: Transcription for Pathology. 309-318 - Peter Polák, Michal Novák, Katerina Rysová

, Magdaléna Rysová, Ondrej Bojar
:
Automated Speaking Assessment for L2 Learners of Czech. 319-330 - Melissa Torgbi, Andrew Clayman, Jordan J. Speight, Joe Hirst, Harish Tayyar Madabushi

:
Inclusive ASR for Critical Public Services: Debiasing with Actor-Simulated Speech. 331-342 - Terry Yi Zhong, Cristian Tejedor García

, Martha A. Larson, Bastiaan R. Bloem:
RECA-PD: A Robust Explainable Cross-Attention Method for Speech-Based Parkinson's Disease Classification. 343-355 - Ishaan Mahapatra

, Nihar R. Mahapatra
:
Systematic FAIRness Assessment of Open Voice Biomarker Datasets for Mental Health and Neurodegenerative Diseases. 356-368 - Matthias Busch, Arun Jose, Ingo Siegert:

When Silence Speaks: Understanding Open-Ended Responses via LLMs in Therapeutic Voice Interaction. 369-380 - Elif Nehir Ulu, Ece Derya, Duygu Tümer, Berkan Demirel, Alper Karamanlioglu:

Multilingual Domain Adaptation for Speech Recognition Using LLMs. 381-393 - Simen Dymbe, Sabato Marco Siniscalchi, Torbjørn Svendsen, Giampiero Salvi:

Using Cross-Attention for Conversational ASR over the Telephone. 394-405

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














