Dec 5, 2023
Coreference resolution (CR) is a task to resolve the real-world entity/ event referred to by a pronoun or phrase in a given text. It is a core natural language processing (NLP) component that underlies and empowers major downstream NLP applications such as machine translation, chatbots, and question-answering. Despite its broad impact, the problem of testing CR systems has rarely been studied. A major difficulty is the shortage of a labeled dataset for testing. While it is possible to feed arbitrary sentences as test inputs to a CR system, a test oracle that captures their expected test outputs (coreference relations) is hard to define automatically. To address the challenge, we propose Crest, an automated testing methodology for CR systems. Crest uses constituency and dependency relations to construct pairs of test inputs subject to the same coreference. These relations can be leveraged to define the metamorphic relation for metamorphic testing. We compare Crest with five state-of-the-art test generation baselines on two popular CR systems and apply them to generate tests from 200 sentences randomly sampled from CoNLL-2012, a popular dataset for coreference resolution. Experimental results show that Crest outperforms baselines significantly. The issues reported by Crest reveal at most 77% of sentences wrongly resolved by the concerned CR system while achieving the lowest false positive rate (≤2%).
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker