A limited-iteration bisecting K-means for fast clustering large datasets

Yu Zhuang, Yu Mao, Xin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Bisecting K-means (BKM) clustering, with or without refinement, has been shown to exhibit higher computing efficiency, better clustering quality, and low susceptibility to initial cluster centers, when compared with the basic K-means clustering algorithm. For bisecting K-means with refinement, in this paper, we investigate a variant that increases the efficiency while trying to maintain clustering quality. Our approach is to limit the number of iterations of the two-means (the K-means with K=2) in bisecting a data subset. We experimented with one, two, and three iterations for the two-means, and compared them with the original BKM's unlimited iterations which end when two clusters no longer change in the two-means. We carried out experimental studies on three datasets and found that three and unlimited iterations for the two-means produced almost the same clustering qualities on all test cases, leading us to think that three iterations might be adequate. The experimental data also show that the limited-iteration BKM with three iterations led to higher computing efficiency when compared with the BKM, suggesting that limiting the iterations in bisecting K-means has the potential of achieving higher efficiency while maintaining clustering quality.

Original languageEnglish
Title of host publicationProceedings - 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2257-2262
Number of pages6
ISBN (Electronic)9781509032051
DOIs
StatePublished - 2016
EventJoint 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016 - Tianjin, China
Duration: Aug 23 2016Aug 26 2016

Publication series

NameProceedings - 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016

Conference

ConferenceJoint 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016
CountryChina
CityTianjin
Period08/23/1608/26/16

Keywords

  • Bisecting K-means
  • Clustering
  • Large datasets

Fingerprint Dive into the research topics of 'A limited-iteration bisecting K-means for fast clustering large datasets'. Together they form a unique fingerprint.

Cite this