In: Computing and Informatics, vol. 26, no. 3
K. Machová - P. Bednár - M. Mach
Detaily:
Rok, strany: 2007, 301 - 327
Kľúčové slová:
information extraction, document categorisation, boosting, predicted categories, click stream, kex word generation
O článku:
The paper focuses on the field of automatic extraction of information from texts
and text document categorisation including pre-processing of text documents,
which can be found on the Internet. In the frame of the presented work, we have
devoted our attention to the following issues related to text categorisation:
increasing the precision of categorisation algorithm results with the aid of a
boosting method; searching a minimum number of decision trees, which enables the
improvement of the categorisation; the influence of unlabeled data with
predicted categories on categorisation precision; shortening click streams
needed to access a given web document; and generation of key words related with
a web document. The paper presents also results of experiments, which were
carried out using the 20 News Groups and Reuters-21578 collections of documents
and a collection of documents from an Internet portal of the Markiza
broadcasting company.
Ako citovať:
ISO 690:
Machová, K., Bednár, P., Mach, M. 2007. Various Approaches to Web Information Processing. In Computing and Informatics, vol. 26, no.3, pp. 301-327. 1335-9150.
APA:
Machová, K., Bednár, P., Mach, M. (2007). Various Approaches to Web Information Processing. Computing and Informatics, 26(3), 301-327. 1335-9150.