Automatic Diacritization as Prerequisite Towards the Automatic Generation of Arabic Lexical Recognition Tests

Hamed, Osama; Zesch, Torsten

doi:10.17185/duepublico/72018

Tagungsbeitrag September 2019 CC BY 4.0

Veröffentlicht

Automatic Diacritization as Prerequisite Towards the Automatic Generation of Arabic Lexical Recognition Tests

The automatic generation of Arabic lexical recognition tests entails several NLP challenges, including corpus linguistics, automatic diacritization, lemmatization and language modeling. Here, we only address the problem of automatic diacritization, a step that paves the road for the automatic generation of Arabic LRTs. We conduct a comparative study between the available tools for diacritization (Farasa and Madamira) and a strong baseline. We evaluate the error rates for these systems using a set of publicly available (almost) fully diacritized corpora, but in a relaxed evaluation mode to ensure fair comparison. Farasa outperforms Madamira and the baseline under all conditions.

Vorschau

Einordnung

Konferenz:

3rd International Conference on Natural Language and Speech Processing, 12–13 September, 2019. University of Trento, Italy

Datum der Veröffentlichung:

09.2019

URN:

urn:nbn:de:hbz:464-20211018-113639-9

DOI:

10.17185/duepublico/72018

Sprache:

Englisch

Ressourcentyp:

Text

Kollektion:

E-Publikationen

Sachgruppen der Deutschen Nationalbibliographie:

004 Informatik

Link URL:

https://aclanthology.org/W19-7414

Einrichtung:

Fakultät für Ingenieurwissenschaften, Informatik und Angewandte Kognitionswissenschaft, Informatik, Sprachtechnologie

Informationen zur Erstveröffentlichung:

Hamed, Osama/Zesch, Torsten (2019): Automatic Diacritization as Prerequisite Towards the Automatic Generation of Arabic Lexical Recognition Tests. In: Proceedings of the 3rd International Conference on Natural Language and Speech Processing. University of Trento, Italy, 12–13 September, 2019. pp. 100-106. Association for Computational Linguistics. https://aclanthology.org/W19-7414

Published September 2019

auf die Merkliste