Symmetric repositioning of bisecting K-means centers for increased reduction of distance calculations for big data clustering

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Clustering is one of the fundamental data mining procedures. Bisecting K-means (BKM) clustering has been studied to have higher computing efficiency and better clustering quality when compared with the basic Lloyd version of the K-means clustering. Elkan's method of utilizing triangle inequality significantly reduces distance calculations, and is applicable to each K-means iteration without affecting the clustering result of the iteration. Thus, it is natural to think of using the Elkan method to further improve the efficiency of the already efficient bisecting K-means. In this paper, we find that the bisecting K-means allows the repositioning of the two centers of a to-be-bisected cluster to further reduce distance calculations. Based on our heuristic analysis we investigate a repositioning strategy with a set of repositioning parameters, and incorporated the repositioning techniques into the Elkan method for bisecting K-means algorithms. We tested these new algorithms on three big datasets with millions of data points and compared with the straightforwardly combined Elkan-BKM without center repositioning. The experimental results show that the center-repositioned algorithms have fewer distance calculations than the Elkan-BKM algorithms without center repositioning in almost all cases. While our repositioning parameters produced good results for the tested datasets, these parameter sets are derived based on our intuitive thinking and hence they are by no means optimal. The experimental data presented in this paper suggest the potential of achieving higher efficiency if better repositioning parameters can be discovered.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
EditorsRonay Ak, George Karypis, Yinglong Xia, Xiaohua Tony Hu, Philip S. Yu, James Joshi, Lyle Ungar, Ling Liu, Aki-Hiro Sato, Toyotaro Suzumura, Sudarsan Rachuri, Rama Govindaraju, Weijia Xu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2709-2715
Number of pages7
ISBN (Electronic)9781467390040
DOIs
StatePublished - 2016
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: Dec 5 2016Dec 8 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

Conference

Conference4th IEEE International Conference on Big Data, Big Data 2016
CountryUnited States
CityWashington
Period12/5/1612/8/16

Keywords

  • bisecting K-means
  • cluster center repositioning
  • clustering
  • large datasets
  • the Elkan method

Fingerprint Dive into the research topics of 'Symmetric repositioning of bisecting K-means centers for increased reduction of distance calculations for big data clustering'. Together they form a unique fingerprint.

  • Cite this

    Zhuang, Y. (2016). Symmetric repositioning of bisecting K-means centers for increased reduction of distance calculations for big data clustering. In R. Ak, G. Karypis, Y. Xia, X. T. Hu, P. S. Yu, J. Joshi, L. Ungar, L. Liu, A-H. Sato, T. Suzumura, S. Rachuri, R. Govindaraju, & W. Xu (Eds.), Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 2709-2715). [7840916] (Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2016.7840916