Jul 22, 2024
The field of machine learning relies on benchmarking and evaluation datasets to accurately track progress in the field and assess the efficacy of new models and methodologies. For this reason, good evaluation practices and accurate reporting are crucial. However, language models (LMs) not only inherit the challenges previously faced in benchmarking, but also introduce a slew of novel considerations which can make proper comparison across models difficult, misleading, or near-impossible. In this tutorial, we aim to bring attendees up to speed on the state of LM evaluation, and highlight current challenges in evaluating language model performance through discussing the various fundamental methods commonly associated with evaluating progress in language model research. We will then discuss how these common pitfalls can be addressed and what considerations should be taken to enhance future work, especially as we seek to evaluate ever more complex properties of LMs.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker