Optimized data-driven order selection method for Gaussian mixtures on clustering problems

Enrique Corona, Brian Nutter, Sunanda Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Perhaps the most fundamental consideration when modeling data as a mixture of Gaussians is the number of components in the mixture. To this end, numerous approaches have been proposed, ranging from the classic use of statistical hypothesis testing methods to make decisions, to the determination of balance between the model Goodness-of-Fit (GoF) and complexity. In this paper, we explore an existing simple yet powerful order selection method developed in the field of information theory, the Jump method. This method infers the model order by estimating, transforming, and analyzing a description of the distortion-rate function, R(D) of the input data. The description of the R(D) curve is efficiently estimated through the popular K-means clustering algorithm using proper seeding techniques. The proposed adaptations to the Jump method allow for higher sensitivity and improved performance at low dimensionality. These adaptations are experimentally tested in a clustering setting with synthetic and natural data. The results suggest better performance than with the original version.

Original languageEnglish
Title of host publication2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Proceedings
Pages73-76
Number of pages4
DOIs
StatePublished - 2010
Event2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010 - Austin, TX, United States
Duration: May 23 2010May 25 2010

Publication series

NameProceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation

Conference

Conference2010 IEEE Southwest Symposium on Image Analysis and Interpretation, SSIAI 2010
Country/TerritoryUnited States
CityAustin, TX
Period05/23/1005/25/10

Keywords

  • Gaussian mixtures
  • K-means clustering
  • Lossy data compression
  • Model order identification
  • Rate-distortion theory

Fingerprint

Dive into the research topics of 'Optimized data-driven order selection method for Gaussian mixtures on clustering problems'. Together they form a unique fingerprint.

Cite this