Facebook Instagram Twitter RSS Feed PodBean Back to top on side

An HMM-based PoS tagger for Old Church Slavonic

In: Jazykovedný časopis, vol. 72, no. 2
Olga Lyashevskaya - Ilia Afanasev
Detaily:
Rok, strany: 2021, 556 - 567
Jazyk: eng
Kľúčové slová:
HMM tagger, Old Church Slavonic, PoS tagging, hybrid models, Universal Dependencies
Typ článku: Natural Language Processing and Corpus Building
O článku:
We present a hybrid HMM-based PoS tagger for Old Church Slavonic. The training corpus is a portion of one text, Codex Marianus (40k) annotated with the Universal Dependencies UPOS tags in the UD-PROIEL treebank. We perform a number of experiments in within-domain and out-of-domain settings, in which the remaining part of Codex Marianus serves as a within-domain test set, and Kiev Folia is used as an out-of-domain test set. Analysing by-PoS-class precision and sensitivity in each run, we combine a simple context-free n-gram-based approach and Hidden Markov method (HMM), and added linguistic rules for specific cases such as punctuation and digits. While the model achieves a rather non-impressive accuracy of 81% in in-domain settings, we observe an accuracy of 51% in out-of-domain evaluation, which is comparable to the results of large neural architectures based on pre-trained contextual embeddings.
Ako citovať:
ISO 690:
Lyashevskaya, O., Afanasev, I. 2021. An HMM-based PoS tagger for Old Church Slavonic. In Jazykovedný časopis, vol. 72, no.2, pp. 556-567. ISSN 0021-5597. DOI: https://doi.org/10.2478/jazcas-2021-0051

APA:
Lyashevskaya, O., Afanasev, I. (2021). An HMM-based PoS tagger for Old Church Slavonic. Jazykovedný časopis, 72(2), 556-567. ISSN 0021-5597. DOI: https://doi.org/10.2478/jazcas-2021-0051
O vydaní: