Counting What Counts: Decompounding for Keyphrase Extraction

A core assumption of keyphrase extraction is that a concept is more important if it is mentioned more often in a document. Especially in languages like German that form large noun compounds, frequency counts might be misleading as concepts “hidden” in compounds are not counted. We hypothesize that using decompounding before counting term frequencies may lead to better keyphrase extraction. We identified two effects of decompounding: (i) enhanced frequency counts, and (ii) more keyphrase candidates. We created two German evaluation datasets to test our hypothesis and analyzed the effect of additional decompounding for keyphrase extraction.

Zitieren

Zitierform:
Zitierform konnte nicht geladen werden.

Rechte