TY - JOUR
T1 - "Missing is useful"
T2 - Missing values in cost-sensitive decision trees
AU - Zhang, Shichao
AU - Qin, Zhenxing
AU - Ling, Charles X.
AU - Sheng, Shengli
N1 - Funding Information:
This work is partially supported by Australian large ARC grants (DP0343109 and DP0559536), a China NSFC major research program (60496327), and a China NSFC grant (60463003).
PY - 2005/12
Y1 - 2005/12
N2 - Many real-world data sets for machine learning and data mining contain missing values and much previous research regards it as a problem and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, "missing is useful" as missing values actually reduces the total cost of tests and misclassifications and, therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning.
AB - Many real-world data sets for machine learning and data mining contain missing values and much previous research regards it as a problem and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, "missing is useful" as missing values actually reduces the total cost of tests and misclassifications and, therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning.
KW - Induction
KW - Knowledge acquisition
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=30344485118&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2005.188
DO - 10.1109/TKDE.2005.188
M3 - Article
AN - SCOPUS:30344485118
SN - 1041-4347
VL - 17
SP - 1689
EP - 1693
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 12
ER -