Towards scaling up induction of second-order decision tables

R. Hewett, J. Leuchner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

One of the fundamental challenges for data mining is to enable inductive learning algorithms to operate on very large databases. Ensemble learning techniques such as bagging have been applied successfully to improve accuracy of classification models by generating multiple models, from replicate training sets, and aggregating them to form a composite model. In this paper, we adapt the bagging approach for scaling up and also study effects of data partitioning, sampling, and aggregation techniques for mining very large databases. Our recent work developed SORCER, a learning system that induces a near minimal rule set from a data set represented as a second-order decision table (a database relation in which rows have sets of atomic values as components). Despite its simplicity, experiments show that SORCER is competitive to other, state-of-the-art induction systems. Here we apply SORCER using two instance subset selection procedures (random partitioning and sampling with replacement) and two aggregation procedures (majority voting and selecting the model that performs best on a validation set). We experiment with the GIS data set, from the UCI KDD Repository, which contains 581,012 instances of 30×30 meter cells with 54 attributes for classifying forest cover types. Performance results are reported including results from mining the entire training data set using different compression algorithms in SORCER and published results from neural net and decision tree learners.

Original languageEnglish
Title of host publicationData Mining III
EditorsA. Zanasi, C.A. Brebbia, N.F.F.E. Ebecken, P. Melli
PublisherWITPress
Pages385-394
Number of pages10
Volume6
ISBN (Print)1853128309
StatePublished - 2002
EventThird International Conference on Data Mining, Data Mining III - Bologna, Italy
Duration: Sep 25 2002Sep 27 2002

Conference

ConferenceThird International Conference on Data Mining, Data Mining III
CountryItaly
CityBologna
Period09/25/0209/27/02

Fingerprint Dive into the research topics of 'Towards scaling up induction of second-order decision tables'. Together they form a unique fingerprint.

Cite this