Interpretability Metrics for Neural Models of Text Adequacy

The proper wording is key to effectively convey a message. A news article that is not  understandable for its target audience is useless and online comments which are not  phrased constructively easily lead to a toxic discussion culture. Human readers can  intuitively judge the adequacy of a text by weighing content aspects such as the relevance of  the topic against stylistic aspects such as lexical and syntactic complexity.

Download Project Report

Neural models are able to consistently label text adequacy but they are not able to explain  their decision. The transformer architecture underlying most state-of-the-art models makes it  almost impossible for users to understand how information is being processed and  evaluated. This is problematic when a human professional makes decisions based on the  model outcome (e.g., as a gatekeeper for information). As a remedy, interpretability methods  using, for example, attention patterns (Vig, 2019), gradient-based saliency (Li et al., 2016),  subset erasure (de Cao et al., 2020), surrogate models (Ribeiro et al., 2016), or influence  functions (Koh and Liang, 2016) are being developed to provide post-hoc explanations for  the model computations. Their applicability to language input and in particular to longer texts  is currently an open research question.

With our project we aim to contribute to the growing field of model interpretability for  responsible and reliable AI and better understand the linguistic factors underlying text  adequacy classification. This project is novel in the following ways:

  1. Most of the interpretability metrics can only be used to examine token-level  phenomena on small input snippets for tasks such as sentiment analysis. We explore  how interpretability metrics can be adapted to capture inter-sentential relations in  longer texts.
  2. Computational linguists have developed a rich inventory of semantic and stylistic  methods to represent the interplay of different factors for text adequacy. We analyze  to which extent neural models account for similar knowledge as linguistically  motivated models when determining text adequacy and how this is captured by  interpretability metrics.