PT Unknown
AU Gold, D
TI Annotating and Analyzing Semantic Relations between Texts
PD 08
PY 2021
DI 10.17185/duepublico/74633
LA en
AB In this thesis, we investigate machine computable and at the same time human-understandable representation dimensions of text that can subsequently be used to filter and display information. While texts can be represented individually e.g. using numeric dimensions such as sentence length or grammatical components, we focus on representation dimensions that express relations between pairs of text. Most of the herein researched relation dimensions are binary, meaning that the relations of interest either do or do not exist between a text pair. Some dimensions are inherently defined as text-to-text relations e.g. textual entailment, paraphrases, contradiction, or semantic similarity. That is, there can be no paraphrase within one text, but it is a relation between a text pair. While there has been much research on these dimensions individually, one of our contributions is the empirical research on the links between them. On the one hand, this provides us with a better understanding of each individual dimension. For instance, we find that although entailment, as well as paraphrases, exclude contradictions, text pairs not containing entailment are not necessarily contradictions, which has, however, been considered a given many previous works. On the other hand, our analysis has the potential of improving transfer learning by using corpora on one of the dimensions to automatize another. We find, i.a. that the most prominent assumed link between dimensions—bi-directional entailment being equivalent to paraphrases—does not always hold. However, in most cases it is true, meaning that transfer learning between these dimensions is possible. As for dimensions that can also exist for individual pieces of text, we believe that some of them can also be better researched as relations between texts. By rating the sentiment of text in comparison to other texts instead of using a scale for each individual text, this has already been shown on the example of sentiment. Another contribution of this thesis is considering not only sentiment, but also specificity, as a relation. We find that specificity, just like sentiment, can be reliably annotated as a relation. Moreover, we find further potential parallels to sentiment regarding the operationalization of specificity—it can be more reliable annotated with an aspect, similar to the task of aspect-based sentiment. A further contribution of this thesis is the research on the link between dimensions that are inherently a relation and the under-researched phenomenon of specificity. For instance, we hypothesize that the entailed text of an entailment pair has a lower specificity level than the entailing text, as the entailed text should not contain any additional information than already described in the entailing text. The analysis of links between the inherent relation dimensions and specificity helps us to deepen our understanding of this under-researched phenomenon and gives an incentive on how to improve its automation. Finally, we present two potential applications using each dimension, namely heterogeneous multi-document summarization, and a more specific kind of summarization—user specific hotel review filtering.
ER