Roulette sampling for cost-sensitive learning

Victor S. Sheng, Charles X. Ling

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

In this paper, we propose a new and general preprocessor algorithm, called CSRoulette, which converts any cost-insensitive classification algorithms into cost-sensitive ones. CSRoulette is based on cost proportional roulette sampling technique (called CPRS in short). CSRoulette is closely related to Costing, another cost-sensitive meta-learning algorithm, which is based on rejection sampling. Unlike rejection sampling which produces smaller samples, CPRS can generate different size samples. To further improve its performance, we apply ensemble (bagging) on CPRS; the resulting algorithm is called CSRoulette. Our experiments show that CSRoulette outperforms Costing and other meta-learning methods in most datasets tested. In addition, we investigate the effect of various sample sizes and conclude that reduced sample sizes (as in rejection sampling) cannot be compensated by increasing the number of bagging iterations.

Original languageEnglish
Title of host publicationMachine Learning
Subtitle of host publicationECML 2007 - 18th European Conference on Machine Learning, Proceedings
PublisherSpringer-Verlag
Pages724-731
Number of pages8
ISBN (Print)9783540749578
DOIs
StatePublished - 2007
Event18th European Conference on Machine Learning, ECML 2007 - Warsaw, Poland
Duration: Sep 17 2007Sep 21 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4701 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th European Conference on Machine Learning, ECML 2007
Country/TerritoryPoland
CityWarsaw
Period09/17/0709/21/07

Keywords

  • Classification
  • Cost-sensitive learning
  • Data mining
  • Decision trees
  • Machine learning
  • Meta-learning

Fingerprint

Dive into the research topics of 'Roulette sampling for cost-sensitive learning'. Together they form a unique fingerprint.

Cite this