Text Reuse Detection Using a Composition of Text Similarity Measures

Bär, Daniel; Zesch, Torsten; Gurevych, Iryna

doi:10.17185/duepublico/72183

Tagungsbeitrag 2012 CC BY-NC-SA 3.0

Veröffentlicht

Text Reuse Detection Using a Composition of Text Similarity Measures

Bär, Daniel ; Zesch, Torsten ; Gurevych, Iryna

Detecting text reuse is a fundamental requirement for a variety of tasks and applications, ranging from journalistic text reuse to plagiarism detection. Text reuse is traditionally detected by computing similarity between a source text and a possibly reused text. However, existing text similarity measures exhibit a major limitation: They compute similarity only on features which can be derived from the content of the given texts, thereby inherently implying that any other text characteristics are negligible. In this paper, we overcome this traditional limitation and compute similarity along three characteristic dimensions inherent to texts: content, structure, and style. We explore and discuss possible combinations of measures along these dimensions, and our results demonstrate that the composition consistently outperforms previous approaches on three standard evaluation datasets, and that text reuse detection greatly benefits from incorporating a diverse feature set that reflects a wide variety of text characteristics.

Vorschau

Einordnung

Konferenz:: COLING 2012, 24th International Conference on Computational Linguistics, 8-15 December 2012, Mumbai, India
Datum der Veröffentlichung:: 2012
URN:: urn:nbn:de:hbz:464-20211028-140855-9
DOI:: 10.17185/duepublico/72183
Sprache:: Englisch
Ressourcentyp:: Text
Kollektion:: E-Publikationen
Sachgruppen der Deutschen Nationalbibliographie:: 004 Informatik
Link URL:: https://aclanthology.org/C12-1011
Einrichtung:: Fakultät für Ingenieurwissenschaften, Informatik und Angewandte Kognitionswissenschaft, Informatik, Sprachtechnologie
Informationen zur Erstveröffentlichung:: Bär, D., Zesch, T., Gurevych, I. (2012) Text Reuse Detection Using a Composition of Text Similarity Measures. Proceedings of COLING 2012: Technical Papers, pp. 167–184. The COLING 2012 Organizing Committee. https://aclanthology.org/C12-1011