The proper wording is key to effectively convey a message. A news article that is not understandable for its target audience is useless and online comments which are not phrased constructively easily lead to a toxic discussion culture. Human readers can intuitively judge the adequacy of a text by weighing content aspects such as the relevance of the topic against stylistic aspects such as lexical and syntactic complexity.
Neural models are able to consistently label text adequacy but they are not able to explain their decision. The transformer architecture underlying most state-of-the-art models makes it almost impossible for users to understand how information is being processed and evaluated. This is problematic when a human professional makes decisions based on the model outcome (e.g., as a gatekeeper for information). As a remedy, interpretability methods using, for example, attention patterns (Vig, 2019), gradient-based saliency (Li et al., 2016), subset erasure (de Cao et al., 2020), surrogate models (Ribeiro et al., 2016), or influence functions (Koh and Liang, 2016) are being developed to provide post-hoc explanations for the model computations. Their applicability to language input and in particular to longer texts is currently an open research question.
With our project we aim to contribute to the growing field of model interpretability for responsible and reliable AI and better understand the linguistic factors underlying text adequacy classification. This project is novel in the following ways:
- Most of the interpretability metrics can only be used to examine token-level phenomena on small input snippets for tasks such as sentiment analysis. We explore how interpretability metrics can be adapted to capture inter-sentential relations in longer texts.
- Computational linguists have developed a rich inventory of semantic and stylistic methods to represent the interplay of different factors for text adequacy. We analyze to which extent neural models account for similar knowledge as linguistically motivated models when determining text adequacy and how this is captured by interpretability metrics.