Facebook Instagram Twitter RSS Feed PodBean Back to top on side

Analysing Accuracy of Slovak Language Lemmatization and MSD Tagging

In: Slovenská reč, vol. 88, no. 2
Radovan Garabík - Denis Mitana
Detaily:
Rok, strany: 2023, 129 - 140
Jazyk: eng
Kľúčové slová:
lemmatization, MSD tagging, POS tagging, Slovak
Typ článku: Štúdie a články
O článku:
Lemmatization and morphological tagging is an indispensable step in Slovak corpus linguistics. In this article, we evaluate two state-of-the-art Slovak language lemmatizers and MSD taggers. One is based on MorphoDiTa and the other is based on spaCy. We measured accuracy on the test subset of manually lemmatized and MSD annotated corpus and found that the combination of lemma and tag achieved 93.5% accuracy with MorphoDiTa, and 95.6% accuracy with spaCy. Most of the errors occurred in disambiguating MSD tags for homonymous uninflected parts of speech such as particles, conjunctions, and adverbs, and in disambiguating singular masculine inanimate nominative and accusative. In these cases, spaCy shows a noticeable improvement over MorphoDiTa, likely due to a better exploitation of the context of the words.
Ako citovať:
ISO 690:
Garabík, R., Mitana, D. 2023. Analysing Accuracy of Slovak Language Lemmatization and MSD Tagging. In Slovenská reč, vol. 88, no.2, pp. 129-140. ISSN 0037-6981.

APA:
Garabík, R., Mitana, D. (2023). Analysing Accuracy of Slovak Language Lemmatization and MSD Tagging. Slovenská reč, 88(2), 129-140. ISSN 0037-6981.