Optimized data-driven order selection method for Gaussian mixtures on clustering problems

Enrique Corona; Brian Nutter; Sunanda Mitra

doi:10.1109/SSIAI.2010.5483914

Optimized data-driven order selection method for Gaussian mixtures on clustering problems

Enrique Corona, Brian Nutter, Sunanda Mitra

Electrical and Computer Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Perhaps the most fundamental consideration when modeling data as a mixture of Gaussians is the number of components in the mixture. To this end, numerous approaches have been proposed, ranging from the classic use of statistical hypothesis testing methods to make decisions, to the determination of balance between the model Goodness-of-Fit (GoF) and complexity. In this paper, we explore an existing simple yet powerful order selection method developed in the field of information theory, the Jump method. This method infers the model order by estimating, transforming, and analyzing a description of the distortion-rate function, R(D) of the input data. The description of the R(D) curve is efficiently estimated through the popular K-means clustering algorithm using proper seeding techniques. The proposed adaptations to the Jump method allow for higher sensitivity and improved performance at low dimensionality. These adaptations are experimentally tested in a clustering setting with synthetic and natural data. The results suggest better performance than with the original version.

Original language	English
Title of host publication	2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Proceedings
Pages	73-76
Number of pages	4
DOIs	https://doi.org/10.1109/SSIAI.2010.5483914
State	Published - 2010
Event	2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Austin, TX, United States Duration: May 23 2010 → May 25 2010

Publication series

Name	Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation

Conference

Conference	2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010
Country/Territory	United States
City	Austin, TX
Period	05/23/10 → 05/25/10

Keywords

Gaussian mixtures
K-means clustering
Lossy data compression
Model order identification
Rate-distortion theory

Access to Document

10.1109/SSIAI.2010.5483914

Cite this

Corona, E., Nutter, B., & Mitra, S. (2010). Optimized data-driven order selection method for Gaussian mixtures on clustering problems. In 2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Proceedings (pp. 73-76). Article 5483914 (Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation). https://doi.org/10.1109/SSIAI.2010.5483914

@inproceedings{f09d6dc4b334435e8b8769349d0fa284,

title = "Optimized data-driven order selection method for Gaussian mixtures on clustering problems",

abstract = "Perhaps the most fundamental consideration when modeling data as a mixture of Gaussians is the number of components in the mixture. To this end, numerous approaches have been proposed, ranging from the classic use of statistical hypothesis testing methods to make decisions, to the determination of balance between the model Goodness-of-Fit (GoF) and complexity. In this paper, we explore an existing simple yet powerful order selection method developed in the field of information theory, the Jump method. This method infers the model order by estimating, transforming, and analyzing a description of the distortion-rate function, R(D) of the input data. The description of the R(D) curve is efficiently estimated through the popular K-means clustering algorithm using proper seeding techniques. The proposed adaptations to the Jump method allow for higher sensitivity and improved performance at low dimensionality. These adaptations are experimentally tested in a clustering setting with synthetic and natural data. The results suggest better performance than with the original version.",

keywords = "Gaussian mixtures, K-means clustering, Lossy data compression, Model order identification, Rate-distortion theory",

author = "Enrique Corona and Brian Nutter and Sunanda Mitra",

year = "2010",

doi = "10.1109/SSIAI.2010.5483914",

language = "English",

isbn = "9781424478026",

series = "Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation",

pages = "73--76",

booktitle = "2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Proceedings",

note = "2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 ; Conference date: 23-05-2010 Through 25-05-2010",

}

Corona, E, Nutter, B & Mitra, S 2010, Optimized data-driven order selection method for Gaussian mixtures on clustering problems. in 2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Proceedings., 5483914, Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation, pp. 73-76, 2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010, Austin, TX, United States, 05/23/10. https://doi.org/10.1109/SSIAI.2010.5483914

Optimized data-driven order selection method for Gaussian mixtures on clustering problems. / Corona, Enrique; Nutter, Brian; Mitra, Sunanda.
2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Proceedings. 2010. p. 73-76 5483914 (Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Optimized data-driven order selection method for Gaussian mixtures on clustering problems

AU - Corona, Enrique

AU - Nutter, Brian

AU - Mitra, Sunanda

PY - 2010

Y1 - 2010

N2 - Perhaps the most fundamental consideration when modeling data as a mixture of Gaussians is the number of components in the mixture. To this end, numerous approaches have been proposed, ranging from the classic use of statistical hypothesis testing methods to make decisions, to the determination of balance between the model Goodness-of-Fit (GoF) and complexity. In this paper, we explore an existing simple yet powerful order selection method developed in the field of information theory, the Jump method. This method infers the model order by estimating, transforming, and analyzing a description of the distortion-rate function, R(D) of the input data. The description of the R(D) curve is efficiently estimated through the popular K-means clustering algorithm using proper seeding techniques. The proposed adaptations to the Jump method allow for higher sensitivity and improved performance at low dimensionality. These adaptations are experimentally tested in a clustering setting with synthetic and natural data. The results suggest better performance than with the original version.

AB - Perhaps the most fundamental consideration when modeling data as a mixture of Gaussians is the number of components in the mixture. To this end, numerous approaches have been proposed, ranging from the classic use of statistical hypothesis testing methods to make decisions, to the determination of balance between the model Goodness-of-Fit (GoF) and complexity. In this paper, we explore an existing simple yet powerful order selection method developed in the field of information theory, the Jump method. This method infers the model order by estimating, transforming, and analyzing a description of the distortion-rate function, R(D) of the input data. The description of the R(D) curve is efficiently estimated through the popular K-means clustering algorithm using proper seeding techniques. The proposed adaptations to the Jump method allow for higher sensitivity and improved performance at low dimensionality. These adaptations are experimentally tested in a clustering setting with synthetic and natural data. The results suggest better performance than with the original version.

KW - Gaussian mixtures

KW - K-means clustering

KW - Lossy data compression

KW - Model order identification

KW - Rate-distortion theory

UR - http://www.scopus.com/inward/record.url?scp=77954763589&partnerID=8YFLogxK

U2 - 10.1109/SSIAI.2010.5483914

DO - 10.1109/SSIAI.2010.5483914

M3 - Conference contribution

AN - SCOPUS:77954763589

SN - 9781424478026

T3 - Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation

SP - 73

EP - 76

BT - 2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Proceedings

T2 - 2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010

Y2 - 23 May 2010 through 25 May 2010

ER -

Optimized data-driven order selection method for Gaussian mixtures on clustering problems

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this