TY - GEN
T1 - Optimized data-driven order selection method for Gaussian mixtures on clustering problems
AU - Corona, Enrique
AU - Nutter, Brian
AU - Mitra, Sunanda
PY - 2010
Y1 - 2010
N2 - Perhaps the most fundamental consideration when modeling data as a mixture of Gaussians is the number of components in the mixture. To this end, numerous approaches have been proposed, ranging from the classic use of statistical hypothesis testing methods to make decisions, to the determination of balance between the model Goodness-of-Fit (GoF) and complexity. In this paper, we explore an existing simple yet powerful order selection method developed in the field of information theory, the Jump method. This method infers the model order by estimating, transforming, and analyzing a description of the distortion-rate function, R(D) of the input data. The description of the R(D) curve is efficiently estimated through the popular K-means clustering algorithm using proper seeding techniques. The proposed adaptations to the Jump method allow for higher sensitivity and improved performance at low dimensionality. These adaptations are experimentally tested in a clustering setting with synthetic and natural data. The results suggest better performance than with the original version.
AB - Perhaps the most fundamental consideration when modeling data as a mixture of Gaussians is the number of components in the mixture. To this end, numerous approaches have been proposed, ranging from the classic use of statistical hypothesis testing methods to make decisions, to the determination of balance between the model Goodness-of-Fit (GoF) and complexity. In this paper, we explore an existing simple yet powerful order selection method developed in the field of information theory, the Jump method. This method infers the model order by estimating, transforming, and analyzing a description of the distortion-rate function, R(D) of the input data. The description of the R(D) curve is efficiently estimated through the popular K-means clustering algorithm using proper seeding techniques. The proposed adaptations to the Jump method allow for higher sensitivity and improved performance at low dimensionality. These adaptations are experimentally tested in a clustering setting with synthetic and natural data. The results suggest better performance than with the original version.
KW - Gaussian mixtures
KW - K-means clustering
KW - Lossy data compression
KW - Model order identification
KW - Rate-distortion theory
UR - http://www.scopus.com/inward/record.url?scp=77954763589&partnerID=8YFLogxK
U2 - 10.1109/SSIAI.2010.5483914
DO - 10.1109/SSIAI.2010.5483914
M3 - Conference contribution
AN - SCOPUS:77954763589
SN - 9781424478026
T3 - Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation
SP - 73
EP - 76
BT - 2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Proceedings
T2 - 2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010
Y2 - 23 May 2010 through 25 May 2010
ER -