An empirical comparison of four text mining methods

Sangno Lee, Jaeki Song, Yongjin Kim

Research output: Contribution to journalArticlepeer-review

47 Scopus citations

Abstract

The amount of textual data that is available for researchers and businesses to analyze is increasing at a dramatic rate. This reality has led IS researchers to investigate various text mining techniques. This essay examines four text mining methods that are frequently used in order to identify their characteristics and limitations. The four methods that we examine are (1) latent semantic analysis, (2) probabilistic latent semantic analysis, (3) latent Dirichlet allocation, and (4) correlated topic model. We review these four methods and compare them with topic detection and spam filtering to reveal their peculiarity. Our paper sheds light on the theory that underlies text mining methods and provides guidance for researchers who seek to apply these methods.

Original languageEnglish
Pages (from-to)1-10
Number of pages10
JournalJournal of Computer Information Systems
Volume51
Issue number1
StatePublished - Sep 2010

Keywords

  • Correlated topic model
  • Latent Dirichlet allocation
  • Latent semantic analysis, probabilistic latent semantic analysis
  • Text mining, vector space model

Fingerprint Dive into the research topics of 'An empirical comparison of four text mining methods'. Together they form a unique fingerprint.

Cite this