UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regressionmodel, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.
Preview
Cite
Rights
Use and reproduction:
This work may be used under aCreative Commons Attribution - NonCommercial - ShareAlike 3.0 License (CC BY-NC-SA 3.0)
.