


default search action
4th Eval4NLP 2023: Bali, Indonesia
- Daniel Deutsch, Rotem Dror, Steffen Eger, Yang Gao, Christoph Leiter, Juri Opitz, Andreas Rücklé:

Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2023, Bali, Indonesia, November 1, 2023. Association for Computational Linguistics 2023, ISBN 979-8-89176-021-9 - Lukas Weber, Krishnan Jothi Ramalingam, Matthias Beyer, Axel Zimmermann:

WRF: Weighted Rouge-F1 Metric for Entity Recognition. 1-11 - Vatsal Raina, Adian Liusie, Mark J. F. Gales:

Assessing Distractors in Multiple-Choice Tests. 12-22 - Yixuan Wang, Qingyan Chen, Duygu Ataman:

Delving into Evaluation Metrics for Generation: A Thorough Assessment of How Metrics Generalize to Rephrasing Across Languages. 23-31 - Zahra Kolagar

, Sebastian Steindl, Alessandra Zarcone:
EduQuick: A Dataset Toward Evaluating Summarization of Informal Educational Content for Social Media. 32-48 - Nitin Ramrakhiyani, Vasudeva Varma, Girish K. Palshikar, Sachin Pawar:

Zero-shot Probing of Pretrained Language Models for Geography Knowledge. 49-61 - Yanran Chen, Steffen Eger:

Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End. 62-84 - Jeremy E. Block

, Yu-Peng Chen, Abhilash Budharapu, Lisa Anthony, Bonnie J. Dorr:
Summary Cycles: Exploring the Impact of Prompt Engineering on Large Language Models' Interaction with Interaction Log Information. 85-99 - Savita Bhat, Vasudeva Varma:

Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content. 100-107 - Jad Doughman, Shady Shehata, Leen Al Qadi, Youssef Nafea, Fakhri Karray:

Can a Prediction's Rank Offer a More Accurate Quantification of Bias? A Case Study Measuring Sexism in Debiased Language Models. 108-116 - Christoph Leiter, Juri Opitz

, Daniel Deutsch, Yang Gao, Rotem Dror, Steffen Eger:
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics. 117-138 - Rui Zhang, Fuhai Song, Hui Huang, Jinghao Yuan, Muyun Yang, Tiejun Zhao:

HIT-MI&T Lab's Submission to Eval4NLP 2023 Shared Task. 139-148 - Abhishek Pradhan, Ketan Kumar Todi:

Understanding Large Language Model Based Metrics for Text Summarization. 149-155 - Pavan Baswani, Ananya Mukherjee, Manish Shrivastava:

LTRC_IIITH's 2023 Submission for Prompting Large Language Models as Explainable Metrics Task. 156-163 - Joonghoon Kim, Sangmin Lee, Seung Hun Han, Saeran Park, Jiyoon Lee, Kiyoon Jeong, Pilsung Kang:

Which is better? Exploring Prompting Strategy For LLM-based Metrics. 164-183 - Yuan Lu, Yu-Ting Lin:

Characterised LLMs Affect its Evaluation of Summary and Translation. 184-192 - Abbas Akkasi, Kathleen C. Fraser

, Majid Komeili:
Reference-Free Summarization Evaluation with Large Language Models. 193-201 - Neema Kotonya, Saran Krishnasamy, Joel R. Tetreault, Alejandro Jaimes:

Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task. 202-218 - Ghazaleh Mahmoudi:

Exploring Prompting Large Language Models as Explainable Metrics. 219-227 - Daniil Larionov, Vasiliy Viskov, George Kokush, Alexander Panchenko, Steffen Eger:

Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation. 228-234

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














