Facebook Instagram Twitter RSS Feed PodBean Back to top on side

Thai Multi-Document Summarization: Unit Segmentation, Unit-Graph Formulation, and Unit Selection

In: Computing and Informatics, vol. 35, no. 1
N. Ketui - Thanaruk Theeramunkong
Detaily:
Rok, strany: 2016, 1 - 29
Kľúčové slová:
Thai text summarization, multi-document summarization, iterative weighting
O článku:
There have been several challenges in summarization of Thai multiple documents since Thai language itself lacks of explicit word/phrase/sentence boundaries. This paper gives definition of Thai Elementary Discourse Unit (TEDU) and then presents our three-stage summarization process. Towards implementation of this process, we propose unit segmentation using TEDUs and their derivatives, unit-graph formation using iterative unit weighting and cosine similarity, and unit selection using highest-weight priority, redundancy removal, and post-selection weight recalculation. To examine performance of the proposed methods, a number of experiments are conducted using fifty sets of Thai news articles with their manually constructed reference summary. By three common evaluation measures of ROUGE-1, ROUGE-2, and ROUGE-SU4, the results evidence that (1) our TEDU-based summarization outperforms paragraph-based summarization, (2) our iterative weighting is superior to traditional TF-IDF, (3) the highest-weight priority without centroid preference and unit redundancy consideration helps improving summary quality, and (4) post-selection weight recalculation tends to raise summarization performance under some certain circumstances.
Ako citovať:
ISO 690:
Ketui, N., Theeramunkong, T. 2016. Thai Multi-Document Summarization: Unit Segmentation, Unit-Graph Formulation, and Unit Selection. In Computing and Informatics, vol. 35, no.1, pp. 1-29. 1335-9150.

APA:
Ketui, N., Theeramunkong, T. (2016). Thai Multi-Document Summarization: Unit Segmentation, Unit-Graph Formulation, and Unit Selection. Computing and Informatics, 35(1), 1-29. 1335-9150.