"Missing is useful": Missing values in cost-sensitive decision trees

Shichao Zhang, Zhenxing Qin, Charles X. Ling, Shengli Sheng

Research output: Contribution to journalArticlepeer-review

62 Scopus citations

Abstract

Many real-world data sets for machine learning and data mining contain missing values and much previous research regards it as a problem and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, "missing is useful" as missing values actually reduces the total cost of tests and misclassifications and, therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning.

Original languageEnglish
Pages (from-to)1689-1693
Number of pages5
JournalIEEE Transactions on Knowledge and Data Engineering
Volume17
Issue number12
DOIs
StatePublished - Dec 2005

Keywords

  • Induction
  • Knowledge acquisition
  • Machine learning

Fingerprint

Dive into the research topics of '"Missing is useful": Missing values in cost-sensitive decision trees'. Together they form a unique fingerprint.

Cite this