default search action
13th LREC 2022: Marseille, France
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis:
Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022. European Language Resources Association 2022 - Alexandre Diniz da Costa, Mateus Coutinho Marim, Ely Edison Matos, Tiago Timponi Torrent:
Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet. 1-12 - Serge Gladkoff, Lifeng Han:
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation. 13-21 - Chanjun Park, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim:
Priming Ancient Korean Neural Machine Translation. 22-28 - Toon Colman, Margot Fonteyne, Joke Daems, Nicolas Dirix, Lieve Macken:
GECO-MT: The Ghent Eye-tracking Corpus of Machine Translation. 29-38 - Levi Remijnse, Piek Vossen, Antske Fokkens, Sam Titarsolej:
Introducing Frege to Fillmore: A FrameNet Dataset that Captures both Sense and Reference. 39-50 - Bolette S. Pedersen, Nathalie Carmen Hau Sørensen, Sanni Nimb, Ida Flørke, Sussi Olsen, Thomas Troelsgård:
Compiling a Suitable Level of Sense Granularity in a Lexicon for AI Purposes: The Open Source COR Lexicon. 51-60 - Francis Bond, Merrick Yeu Herng Choo:
Sense and Sentiment. 61-69 - Joanna Ut-Seong Sio, Luís Morgado da Costa:
Enriching Linguistic Representation in the Cantonese Wordnet and Building the New Cantonese Wordnet Corpus. 70-78 - Nizar Habash, David Palfreyman:
ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus. 79-88 - Necva Bölücü, Burcu Can:
Turkish Universal Conceptual Cognitive Annotation. 89-99 - Tamás Váradi, Bence Nyéki, Svetla Koeva, Marko Tadic, Vanja Stefanec, Maciej Ogrodniczuk, Bartlomiej Niton, Piotr Pezik, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Dan Tufis, Radovan Garabík, Simon Krek, Andraz Repar:
Introducing the CURLICAT Corpora: Seven-language Domain Specific Annotated Corpora from Curated Sources. 100-108 - C. Anton Rytting, Valerie Novak, James R. Hull, Victor M. Frank, Paul Rodrigues, Jarrett G. W. Lee, Laurel Miller-Sims:
RU-ADEPT: Russian Anonymized Dataset with Eight Personality Traits. 109-118 - Quentin Brabant, Gwénolé Lecorvé, Lina Maria Rojas-Barahona:
CoQAR: Question Rewriting on CoQA. 119-126 - Annalena Aicher, Nadine Gerstenlauer, Wolfgang Minker, Stefan Ultes:
User Interest Modelling in Argumentative Dialogue Systems. 127-136 - Giancarlo A. Xompero, Michele Mastromattei, Samir Salman, Cristina Giannone, Andrea Favalli, Raniero Romagnoli, Fabio Massimo Zanzotto:
Every time I fire a conversational designer, the performance of the dialogue system goes down. 137-145 - Yuqiao Wen, Guoqing Luo, Lili Mou:
An Empirical Study on the Overlapping Problem of Open-Domain Dialogue Datasets. 146-153 - Federica Gamba, Francesca Frontini, Daan Broeder, Monica Monachini:
Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project. 154-163 - Marc Schulder, Thomas Hanke:
How to be FAIR when you CARE: The DGS Corpus as a Case Study of Open Science Resources for Minority Languages. 164-173 - Valerio Basile, Cristina Bosco, Michael Fell, Viviana Patti, Rossella Varvara:
Italian NLP for Everyone: Resources and Models from EVALITA to the European Language Grid. 174-180 - Michael Rosner, Sina Ahmadi, Elena Simona Apostol, Julia Bosque-Gil, Christian Chiarcos, Milan Dojchinovski, Katerina Gkirtzou, Jorge Gracia, Dagmar Gromann, Chaya Liebeskind, Giedre Valunaite Oleskeviciene, Gilles Sérasset, Ciprian-Octavian Truica:
Cross-Lingual Link Discovery for Under-Resourced Languages. 181-192 - Valentina Dragos, Delphine Battistelli, Aline Étienne, Yolène Constable:
Angry or Sad ? Emotion Annotation for Extremist Content Characterisation. 193-201 - Nicolas Zampieri, Carlos Ramisch, Irina Illina, Dominique Fohr:
Identification of Multiword Expressions in Tweets for Hate Speech Detection. 202-210 - Michael Jantscher, Roman Kern:
Causal Investigation of Public Opinion during the COVID-19 Pandemic via Social Media Text. 211-226 - Pakawat Nakwijit, Matthew Purver:
Misspelling Semantics in Thai. 227-236 - Véronique Moriceau, Farah Benamara, Abdelmoumene Boumadane:
Automatic Detection of Stigmatizing Uses of Psychiatric Terms on Twitter. 237-243 - Isabelle Mohr, Amelie Wührl, Roman Klinger:
CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets. 244-257 - Francesco Barbieri, Luis Espinosa Anke, José Camacho-Collados:
XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. 258-266 - Areej Alhassan, Jinkai Zhang, Viktor Schlegel:
'Am I the Bad One'? Predicting the Moral Judgement of the Crowd Using Pre-trained Language Models. 267-276 - Kelvin Han, Thiago Castro Ferreira, Claire Gardent:
Generating Questions from Wikidata Triples. 277-290 - Matteo Muffo, Aldo Cocco, Enrico Bertino:
Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition. 291-297 - Yuji Naraki, Tetsuya Sakai, Yoshihiko Hayashi:
Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization. 298-304 - Julius Monsen, Evelina Rennes:
Perceived Text Quality and Readability in Extractive and Abstractive Summaries. 305-312 - Alex Mei, Anisha Kabir, Rukmini Bapat, John Judge, Tony Sun, William Yang Wang:
Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization. 313-318 - Tatsuya Ishigaki, Suzuko Nishino, Sohei Washino, Hiroki Igarashi, Yukari Nagai, Yuichi Washida, Akihiko Murai:
Automating Horizon Scanning in Future Studies. 319-327 - Nguyen Phuc Minh, Tran Hoang Vu, Vu Hoang, Ta Duc Huy, Trung Huu Bui, Steven Quoc Hung Truong:
ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining. 328-337 - Timour Igamberdiev, Ivan Habernal:
Privacy-Preserving Graph Convolutional Networks for Text Classification. 338-350 - Reem Alghamdi, Zhenwen Liang, Xiangliang Zhang:
ArMATH: a Dataset for Solving Arabic Math Word Problems. 351-362 - Benjamin Winter, Alexei Figueroa Rosero, Alexander Löser, Felix Alexander Gers, Amy Siu:
KIMERA: Injecting Domain Knowledge into Vacant Transformer Heads. 363-373 - Andrei-Marius Avram, Darius Catrina, Dumitru-Clementin Cercel, Mihai Dascalu, Traian Rebedea, Vasile Florian Pais, Dan Tufis:
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers. 374-384 - Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Personalized Filled-pause Generation with Group-wise Prediction Models. 385-392 - Imran A. Sheikh, Emmanuel Vincent, Irina Illina:
Transformer versus LSTM Language Models trained on Uncertain ASR Hypotheses in Limited Data Scenarios. 393-399 - Boshko Koloski, Senja Pollak, Blaz Skrlj, Matej Martinc:
Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised? 400-409 - Anastasios Lamproudis, Aron Henriksson, Hercules Dalianis:
Evaluating Pretraining Strategies for Clinical BERT Models. 410-416 - Rustem Yeshpanov, Yerbolat Khassanov, Huseyin Atakan Varol:
KazNERD: Kazakh Named Entity Recognition Dataset. 417-426 - Michail Mersinias, Panagiotis Valvis:
Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization. 427-435 - Mike Zhang, Kristian Nørgaard Jensen, Barbara Plank:
Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning. 436-447 - Roos M. Bakker, Romy A. N. van Drie, Maaike de Boer, Robert van Doesburg, Tom M. van Engers:
Semantic Role Labelling for Dutch Law Texts. 448-457 - Kyle Goslin, Markus Hofmann:
English Language Spelling Correction as an Information Retrieval Task Using Wikipedia Search Statistics. 458-464 - Meisin Lee, Lay-Ki Soon, Eu-Gene Siew, Ly Fie Sugianto:
CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction. 465-479 - Niklas Dehio, Malte Ostendorff, Georg Rehm:
Claim Extraction and Law Matching for COVID-19-related Legislation. 480-490 - Basit Ali, Sachin Pawar, Girish K. Palshikar, Rituraj Singh:
Constructing A Dataset of Support and Attack Relations in Legal Arguments in Court Judgements using Linguistic Rules. 491-500 - Teresa Paccosi, Alessio Palmero Aprosio:
KIND: an Italian Multi-Domain Dataset for Named Entity Recognition. 501-507 - Elena Mikhalkova, Alexander A. Khlyupin:
Russian Jeopardy! Data Set for Question-Answering Systems. 508-514 - Benjamin Hättasch, Carsten Binnig:
Know Better - A Clickbait Resolving Challenge. 515-523 - Dayne Freitag, John Cadigan, Robert Sasseen, Paul Kalmar:
Valet: Rule-Based Information Extraction for Rapid Deployment. 524-533 - Tom Sweers, Iris Hendrickx, Helmer Strik:
Negation Detection in Dutch Spoken Human-Computer Conversations. 534-542 - Christopher Cieri, Mark Liberman, Sunghye Cho, Stephanie M. Strassel, James Fiumara, Jonathan Wright:
Reflections on 30 Years of Language Resource Development and Sharing. 543-550 - Valérie Mapelli, Victoria Arranz, Khalid Choukri, Hélène Mazo:
Language Resources to Support Language Diversity - the ELRA Achievements. 551-558 - Pawel Kamocki, Andreas Witt:
Ethical Issues in Language Resources and Language Technology - Tentative Categorisation. 559-563 - Fanny Ducel, Karën Fort, Gaël Lejeune, Yves Lepage:
Do we Name the Languages we Study? The #BenderRule in LREC and ACL articles. 564-573 - Luna De Bruyne, Akbar Karimi, Orphée De Clercq, Andrea Prati, Véronique Hoste:
Aspect-Based Emotion Analysis and Multimodal Coreference: A Case Study of Customer Comments on Adidas Instagram Posts. 574-580 - Gabriel Roccabruna, Steve Azzolin, Giuseppe Riccardi:
Multi-source Multi-domain Sentiment Analysis with BERT-based Models. 581-589 - Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Aremu Anuoluwapo, Idris Abdulmumin:
NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis. 590-602 - Aline Étienne, Delphine Battistelli, Gwénolé Lecorvé:
A (Psycho-)Linguistically Motivated Scheme for Annotating and Exploring Emotions in a Genre-Diverse Corpus. 603-612 - Jean-Philippe Prost:
Integrating a Phrase Structure Corpus Grammar and a Lexical-Semantic Network: the HOLINET Knowledge Graph. 613-622 - Giorgio Ottolina, Matteo Luigi Palmonari, Manuel Vimercati, Mehwish Alam:
On the Impact of Temporal Representations on Metaphor Detection. 623-632 - Damien Sileo, Marie-Francine Moens:
Analysis and Prediction of NLP Models via Task Embeddings. 633-647 - Amir Hazem, Mérième Bouhandi, Florian Boudin, Béatrice Daille:
Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data. 648-662 - Lena Jurkschat, Gregor Wiedemann, Maximilian Heinrich, Mattes Ruckdeschel, Sunna Torge:
Few-Shot Learning for Argument Aspects of the Nuclear Energy Debate. 663-672 - Anik Jacobsen, Salar Mohtaj, Sebastian Möller:
MuLVE, A Multi-Language Vocabulary Evaluation Data Set. 673-679 - Leonardo Zilio, Hadeel Saadany, Prashant Sharma, Diptesh Kanojia, Constantin Orasan:
PLOD: An Abbreviation Detection Dataset for Scientific Documents. 680-688 - Tosin P. Adewumi, Roshanak Vadoodi, Aparajita Tripathy, Konstantina Nikolaidou, Foteini Liwicki, Marcus Liwicki:
Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms. 689-696 - Marie Bexte, Ronja Laarmann-Quante, Andrea Horbach, Torsten Zesch:
LeSpell - A Multi-Lingual Benchmark Corpus of Spelling Errors to Develop Spellchecking Methods for Learner Language. 697-706 - Laura Seiffe, Fares Kallel, Sebastian Möller, Babak Naderi, Roland Roller:
Subjective Text Complexity Assessment for German. 707-714 - Elena Frick, Thomas Schmidt, Henrike Helmer:
Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora. 715-722 - Piotr Pezik, Gosia Krawentek, Sylwia Karasinska, Pawel Wilk, Paulina Rybinska, Anna Cichosz, Angelika Peljak-Lapinska, Mikolaj Deckert, Michal Adamczyk:
DiaBiz - an Annotated Corpus of Polish Call Center Dialogs. 723-726 - Roberts Dargis, Ilze Auzina, Inga Kaija, Kristine Levane-Petrova, Kristine Pokratniece:
LaVA - Latvian Language Learner corpus. 727-731 - Kenneth Heafield, Elaine Farrow, Jelmer van der Linde, Gema Ramírez-Sánchez, Dion Wiggins:
The EuroPat Corpus: A Parallel Corpus of European Patent Data. 732-740 - Elisabeth Eder, Michael Wiegand, Ulrike Krieg-Holz, Udo Hahn:
"Beste Grüße, Maria Meyer" - Pseudonymization of Privacy-Sensitive Information in Emails. 741-752 - Wolfgang Schmeisser-Nieto, Montserrat Nofre, Mariona Taulé:
Criteria for the Annotation of Implicit Stereotypes. 753-762 - Philipp Klumpp, Tomás Arias-Vergara, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave:
Common Phone: A Multilingual Dataset for Robust Acoustic Modelling. 763-768 - Karim El Haff, Mustafa Jarrar, Tymaa Hammouda, Fadi A. Zaraket:
Curras + Baladi: Towards a Levantine Corpus. 769-778 - Hiroaki Yamada, Takenobu Tokunaga, Ryutaro Ohara, Keisuke Takeshita, Mihoko Sumida:
Annotation Study of Japanese Judgments on Tort for Legal Judgment Prediction with Rationales. 779-790 - Dana Ruiter, Liane Reiners, Ashwin Geet D'Sa, Thomas Kleinbauer, Dominique Fohr, Irina Illina, Dietrich Klakow, Christian Schemer, Angeliki Monnier:
Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online. 791-804 - Ekaterina Lapshinova-Koltunski, Pedro Augusto Ferreira, Elina Lartaud, Christian Hardmeier:
ParCorFull2.0: a Parallel Corpus Annotated with Full Coreference. 805-813 - Maria Boritchev, Maxime Amblard:
A Multi-Party Dialogue Ressource in French. 814-823 - Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Marta Bañón, Sergio Ortiz-Rojas:
Bicleaner AI: Bicleaner Goes Neural. 824-831 - Anisia Katinskaia, Maria Lebedeva, Jue Hou, Roman Yangarber:
Semi-automatically Annotated Learner Corpus for Russian. 832-839 - Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieras, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Abbott Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóga, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer C. White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova:
UniMorph 4.0: Universal Morphology. 840-855 - Dmytro Kalpakchi, Johan Boye:
Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation. 856-866 - Anaïs Ollagnier, Elena Cabrio, Serena Villata, Catherine Blaya:
CyberAgressionAdo-v1: a Dataset of Annotated Online Aggressions in French Collected through a Role-playing Game. 867-875 - Md Saroar Jahan, Mourad Oussalah, Nabil Arhab:
Finnish Hate-Speech Detection on Social Media Using CNN and FinBERT. 876-882 - Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jungseob Lee, Sugyeong Eo, Heuiseok Lim:
Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing. 883-891 - Daniel Edmiston, Phillip Keung, Noah A. Smith:
Domain Mismatch Doesn't Always Prevent Cross-lingual Transfer Learning. 892-899 - Jens-Michalis Papaioannou, Paul Grundmann, Betty van Aken, Athanasios Samaras, Ilias Kyparissidis, George Giannakoulas, Felix A. Gers, Alexander Löser:
Cross-Lingual Knowledge Transfer for Clinical Phenotyping. 900-909 - Paul McNamee, Kevin Duh:
The Multilingual Microblog Translation Corpus: Improving and Evaluating Translation of User-Generated Text. 910-918 - Júlia Sato, Helena de Medeiros Caseli, Lucia Specia:
Multilingual and Multimodal Learning for Brazilian Portuguese. 919-927 - Pedro Jeuris, Jan Niehues:
LibriS2S: A German-English Speech-to-Speech Translation Corpus. 928-935 - Vivien Macketanz, Eleftherios Avramidis, Aljoscha Burchardt, He Wang, Renlong Ai, Shushen Manakhimova, Ursula Strohriegel, Sebastian Möller, Hans Uszkoreit:
A Linguistically Motivated Test Suite to Semi-Automatically Evaluate German-English Machine Translation Output. 936-947 - Evangelia Gogoulou, Ariel Ekgren, Tim Isbister, Magnus Sahlgren:
Cross-lingual Transfer of Monolingual Models. 948-955 - Fynn Petersen-Frey, Marcus Soll, Louis Kobras, Melf Johannsen, Peter Kling, Chris Biemann:
Dataset of Student Solutions to Algorithm and Data Structure Programming Assignments. 956-962 - Ishani Mondal, Kalika Bali, Mohit Jain, Monojit Choudhury, Jacki O'Neill, Millicent Ochieng, Kagonya Awori, Keshet Ronen:
Language Patterns and Behaviour of the Peer Supporters in Multilingual Healthcare Conversational Forums. 963-975 - Zheng Xin Yong, Patrick D. Watson, Tiago Timponi Torrent, Oliver Czulo, Collin F. Baker:
Frame Shift Prediction. 976-986 - Brigitte Bigi, Maryvonne Zimmermann, Carine André:
CLeLfPC: a Large Open Multi-Speaker Corpus of French Cued Speech. 987-994 - Carlos Daniel Hernandez Mena, David Erik Mollberg, Michal Borský, Jón Guðnason:
Samrómur Children: An Icelandic Speech Corpus. 995-1002 - Per Erik Solberg, Pablo Ortiz:
The Norwegian Parliamentary Speech Corpus. 1003-1008 - Martijn Bentum, Louis ten Bosch, Henk van den Heuvel, Simone Wills, Domenique van der Niet, Jelske Dijkstra, Hans Van de Velde:
A Speech Recognizer for Frisian/Dutch Council Meetings. 1009-1015 - Meiko Fukuda, Ryota Nishimura, Maina Umezawa, Kazumasa Yamamoto, Yurie Iribe, Norihide Kitaoka:
Elderly Conversational Speech Corpus with Cognitive Impairment Test and Pilot Dementia Detection Experiment Using Acoustic Characteristics of Speech in Japanese Dialects. 1016-1022 - Ali Can Kocabiyikoglu, François Portet, Prudence Gibert, Hervé Blanchon, Jean-Marc Babouchkine, Gaëtan Gavazzi:
A Spoken Drug Prescription Dataset in French for Spoken Language Understanding. 1023-1031 - Cristian Tejedor García, Berrie van der Molen, Henk van den Heuvel, Arjan van Hessen, Toine Pieters:
Towards an Open-Source Dutch Speech Recognition System for the Healthcare Domain. 1032-1039 - Maria Moutti, Sofia Eleftheriou, Panagiotis Koromilas, Theodoros Giannakopoulos:
A Dataset for Speech Emotion Recognition in Greek Theatrical Plays. 1040-1046 - Liisi Piits, Hille Pajupuu, Heete Sahkai, Rene Altrov, Liis Ermus, Kairi Tamuri, Indrek Hein, Meelis Mihkla, Indrek Kiissel, Egert Männisalu, Kristjan Suluste, Jaan Pajupuu:
Audiobook Dialogues as Training Data for Conversational Style Synthetic Voices. 1047-1053 - Yaru Wu, Fabian M. Suchanek, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker:
Using a Knowledge Base to Automatically Annotate Speech Corpora and to Identify Sociolinguistic Variation. 1054-1060 - Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe:
Phone Inventories and Recognition for Every Language. 1061-1067 - Dimitrios Roussis, Vassilis Papavassiliou, Sokratis Sofianopoulos, Prokopis Prokopidis, Stelios Piperidis:
Constructing Parallel Corpora from COVID-19 News using MediSys Metadata. 1068-1072 - Dongxu Zhang, Sunil Mohan, Michaela Torkar, Andrew McCallum:
A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes. 1073-1082 - Jayetri Bardhan, Anthony M. Colas, Kirk Roberts, Daisy Zhe Wang:
DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries. 1083-1097 - Stella Verkijk, Piek Vossen:
Efficiently and Thoroughly Anonymizing a Transformer Language Model for Dutch Electronic Health Records: a Two-Step Method. 1098-1103 - Loïc Grobol, Mathilde Regnault, Pedro Javier Ortiz Suárez, Benoît Sagot, Laurent Romary, Benoît Crabbé:
BERTrade: Using Contextual Embeddings to Parse Old French. 1104-1113 - Jenna Kanerva, Filip Ginter:
Out-of-Domain Evaluation of Finnish Dependency Parsing. 1114-1124 - Elisa Gugliotta, Marco Dinarelli:
TArC: Tunisian Arabish Corpus, First complete release. 1125-1136 - Zdenek Zabokrtský, Niyati Bafna, Jan Bodnár, Lukás Kyjánek, Emil Svoboda, Magda Sevcíková, Jonás Vidra:
Towards Universal Segmentations: UniSegments 1.0. 1137-1149 - Steven Moran, Christian Bentz, Ximena Gutierrez-Vasques, Olga Pelloni, Tanja Samardzic:
TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP. 1150-1158