Facebook Instagram Twitter RSS Feed PodBean Back to top on side

Linguistic annotation of translated Chinese texts: Coordinating theory, algorithms and data

In: Jazykovedný časopis, vol. 72, no. 2
Kirill I. Semenov - Armine K. Titizian - Aleksandra O. Piskunova - Yulia O. Korotkova - Alena D. Tsvetkova - Elena A. Volf - Alexandra S. Konovalova - Yulia N. Kuznetsova
Detaily:
Rok, strany: 2021, 590 - 602
Jazyk: eng
Kľúčové slová:
Mandarin, Russian, parallel corpus, Chinese word segmentation (CWS), grapheme-to-phoneme conversion (G2P), PoS-tagging, code-switching detection
Typ článku: Natural Language Processing and Corpus Building
O článku:
The article tackles the problems of linguistic annotation in the Chinese texts presented in the Ruzhcorp - Russian-Chinese Parallel Corpus of RNC, and the ways to solve them. Particular attention is paid to the processing of Russian loanwords. On the one hand, we present the theoretical comparison of the widespread standards of Chinese text processing. On the other hand, we describe our experiments in three fields: word segmentation, grapheme-to-phoneme conversion, and PoS-tagging, on the specific corpus data that contains many transliterations and loanwords. As a result, we propose the preprocessing pipeline of the Chinese texts, that will be implemented in Ruzhcorp.
Ako citovať:
ISO 690:
Semenov, K., Titizian, A., Piskunova, A., Korotkova, Y., Tsvetkova, A., Volf, E., Konovalova, A., Kuznetsova, Y. 2021. Linguistic annotation of translated Chinese texts: Coordinating theory, algorithms and data. In Jazykovedný časopis, vol. 72, no.2, pp. 590-602. ISSN 0021-5597. DOI: https://doi.org/10.2478/jazcas-2021-0054

APA:
Semenov, K., Titizian, A., Piskunova, A., Korotkova, Y., Tsvetkova, A., Volf, E., Konovalova, A., Kuznetsova, Y. (2021). Linguistic annotation of translated Chinese texts: Coordinating theory, algorithms and data. Jazykovedný časopis, 72(2), 590-602. ISSN 0021-5597. DOI: https://doi.org/10.2478/jazcas-2021-0054
O vydaní: