Facebook Instagram Twitter RSS Feed PodBean Back to top on side

A Large Spanish-Catalan Parallel Corpus Release for Machine Translation

In: Computing and Informatics, vol. 33, no. 4
M.r. Costa-Jussà - J.a.r. Fonollosa - J.b. Mariño - M. Poch - M. Farrús

Details:

Year, pages: 2014, 907 - 920
Keywords:
Catalan-Spanish parallel corpus, machine translation
About article:
We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7.5 M parallel sentences (around 180 M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in catalog number ELRA-W0053.
How to cite:
ISO 690:
Costa-Jussà, M., Fonollosa, J., Mariño, J., Poch, M., Farrús, M. 2014. A Large Spanish-Catalan Parallel Corpus Release for Machine Translation. In Computing and Informatics, vol. 33, no.4, pp. 907-920. 1335-9150.

APA:
Costa-Jussà, M., Fonollosa, J., Mariño, J., Poch, M., Farrús, M. (2014). A Large Spanish-Catalan Parallel Corpus Release for Machine Translation. Computing and Informatics, 33(4), 907-920. 1335-9150.