Designing a corpus of Czech monologues: ORATOR v2

In: Jazykovedný časopis, vol. 72, no. 2

Marie Kopřivová - Zuzana Laubeová - David Lukeš

Detaily:

Rok, strany: 2021, 520 - 530

Jazyk: eng

Kľúčové slová:

speech, corpus, monologue, Czech

URL originálneho zdroja: https://www.juls.savba.sk/ediela/jc/2021/2/jc21-02.pdf

Typ článku: Natural Language Processing and Corpus Building

DOI: https://doi.org/10.2478/jazcas-2021-0048

Full text

O článku:

ORATOR v2 is a new 1.5M word corpus of Czech monologues, delivered to a live audience in semi-formal to formal settings. It was designed to chart the space of naturally occurring monologues which can be obtained for corpus processing. As such, it aims for diversity but does not attempt any balancing of subcategories, recognizing that some types of data are inherently easier to obtain in high volume than others. The transcription guidelines and annotation tools employed are the same as other recent spoken corpora published by the CNC, which facilitates interesting comparisons between various types of spoken Czech. The present paper sketches out three case studies, comparing ORATOR to the informal conversations of ORTOFON v2 in terms of the frequencies of demonstratives and hesitations, as well as lexical richness.

Ako citovať:

ISO 690:

Kopřivová, M., Laubeová, Z., Lukeš, D. 2021. Designing a corpus of Czech monologues: ORATOR v2. In Jazykovedný časopis, vol. 72, no.2, pp. 520-530. ISSN 0021-5597. DOI: https://doi.org/10.2478/jazcas-2021-0048

APA:

Kopřivová, M., Laubeová, Z., Lukeš, D. (2021). Designing a corpus of Czech monologues: ORATOR v2. Jazykovedný časopis, 72(2), 520-530. ISSN 0021-5597. DOI: https://doi.org/10.2478/jazcas-2021-0048

O vydaní: