New Algorithm for Clustering Distributed Data Using k-Means

In: Computing and Informatics, vol. 33, no. 4

A.m. Khedr - R.k. Bhatnagar

Details:

Year, pages: 2014, 943 - 964

Keywords:

Data privacy, decomposable algorithm, k-means clustering, vertically and horizontally distributed data

About article:

The internet era and high speed networks have ushered in the capabilities to have ready access to large amounts of geographically distributed data. Individuals, businesses, and governments recognize the value of this available resource to those who can transform the data into information. These databases, though valuable as individual entities, become significantly more valuable when they function as parts of a federated database and their data can be aggregated for collective mining or computations. This requires new algorithms to shift their focus from working with single databases to efficiently working with federated databases. In this paper, we propose a new decomposable version of the popular k-means clustering algorithm that works in this desired manner with a set of networked databases. We show that it is possible to perform global computation in a reasonably secure manner for either horizontally or vertically distributed databases. The computation is completed by only exchanging a few local summaries among the databases. An empirical and analytical validation of our results is also presented.

How to cite:

ISO 690:

Khedr, A., Bhatnagar, R. 2014. New Algorithm for Clustering Distributed Data Using k-Means. In Computing and Informatics, vol. 33, no.4, pp. 943-964. 1335-9150.

APA:

Khedr, A., Bhatnagar, R. (2014). New Algorithm for Clustering Distributed Data Using k-Means. Computing and Informatics, 33(4), 943-964. 1335-9150.