Facebook Instagram Twitter RSS Feed PodBean Back to top on side

A MapReduce Based Distributed LSI for Scalable Information Retrieval

In: Computing and Informatics, vol. 33, no. 2
Y. Liu - M. Li - M. Khan - M. Qi

Details:

Year, pages: 2014, 259 - 280
Keywords:
Information retrieval, latent semantic indexing, MapReduce, load balancing, genetic algorithms
About article:
Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a MapReduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.
How to cite:
ISO 690:
Liu, Y., Li, M., Khan, M., Qi, M. 2014. A MapReduce Based Distributed LSI for Scalable Information Retrieval. In Computing and Informatics, vol. 33, no.2, pp. 259-280. 1335-9150.

APA:
Liu, Y., Li, M., Khan, M., Qi, M. (2014). A MapReduce Based Distributed LSI for Scalable Information Retrieval. Computing and Informatics, 33(2), 259-280. 1335-9150.