TY - GEN
T1 - A limited-iteration bisecting K-means for fast clustering large datasets
AU - Zhuang, Yu
AU - Mao, Yu
AU - Chen, Xin
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016
Y1 - 2016
N2 - Bisecting K-means (BKM) clustering, with or without refinement, has been shown to exhibit higher computing efficiency, better clustering quality, and low susceptibility to initial cluster centers, when compared with the basic K-means clustering algorithm. For bisecting K-means with refinement, in this paper, we investigate a variant that increases the efficiency while trying to maintain clustering quality. Our approach is to limit the number of iterations of the two-means (the K-means with K=2) in bisecting a data subset. We experimented with one, two, and three iterations for the two-means, and compared them with the original BKM's unlimited iterations which end when two clusters no longer change in the two-means. We carried out experimental studies on three datasets and found that three and unlimited iterations for the two-means produced almost the same clustering qualities on all test cases, leading us to think that three iterations might be adequate. The experimental data also show that the limited-iteration BKM with three iterations led to higher computing efficiency when compared with the BKM, suggesting that limiting the iterations in bisecting K-means has the potential of achieving higher efficiency while maintaining clustering quality.
AB - Bisecting K-means (BKM) clustering, with or without refinement, has been shown to exhibit higher computing efficiency, better clustering quality, and low susceptibility to initial cluster centers, when compared with the basic K-means clustering algorithm. For bisecting K-means with refinement, in this paper, we investigate a variant that increases the efficiency while trying to maintain clustering quality. Our approach is to limit the number of iterations of the two-means (the K-means with K=2) in bisecting a data subset. We experimented with one, two, and three iterations for the two-means, and compared them with the original BKM's unlimited iterations which end when two clusters no longer change in the two-means. We carried out experimental studies on three datasets and found that three and unlimited iterations for the two-means produced almost the same clustering qualities on all test cases, leading us to think that three iterations might be adequate. The experimental data also show that the limited-iteration BKM with three iterations led to higher computing efficiency when compared with the BKM, suggesting that limiting the iterations in bisecting K-means has the potential of achieving higher efficiency while maintaining clustering quality.
KW - Bisecting K-means
KW - Clustering
KW - Large datasets
UR - http://www.scopus.com/inward/record.url?scp=85015252478&partnerID=8YFLogxK
U2 - 10.1109/TrustCom.2016.0348
DO - 10.1109/TrustCom.2016.0348
M3 - Conference contribution
AN - SCOPUS:85015252478
T3 - Proceedings - 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016
SP - 2257
EP - 2262
BT - Proceedings - 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 August 2016 through 26 August 2016
ER -