Heterogeneity aware random forest for drug sensitivity prediction

Raziur Rahman; Kevin Matlock; Souparno Ghosh; Ranadip Pal

doi:10.1038/s41598-017-11665-4

Heterogeneity aware random forest for drug sensitivity prediction

Raziur Rahman, Kevin Matlock, Souparno Ghosh, Ranadip Pal

Research output: Contribution to journal › Article › peer-review

42 Scopus citations

Abstract

Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea in the commonly used ensemble based predictive model of Random Forests, we propose Heterogeneity Aware Random Forests (HARF) that assigns weights to the trees based on the category of the sample. We treat heterogeneity as a latent class allocation problem and present a covariate free class allocation approach based on the distribution of leaf nodes of the model ensemble. Applications on CCLE and GDSC databases show that HARF outperforms traditional Random Forest when the average drug responses of cancer types are different.

Original language	English
Article number	11347
Journal	Scientific reports
Volume	7
Issue number	1
DOIs	https://doi.org/10.1038/s41598-017-11665-4
State	Published - Dec 1 2017

Access to Document

10.1038/s41598-017-11665-4

Cite this

@article{66bb36ec65004a4695f425ce55a1ce20,

title = "Heterogeneity aware random forest for drug sensitivity prediction",

abstract = "Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea in the commonly used ensemble based predictive model of Random Forests, we propose Heterogeneity Aware Random Forests (HARF) that assigns weights to the trees based on the category of the sample. We treat heterogeneity as a latent class allocation problem and present a covariate free class allocation approach based on the distribution of leaf nodes of the model ensemble. Applications on CCLE and GDSC databases show that HARF outperforms traditional Random Forest when the average drug responses of cancer types are different.",

author = "Raziur Rahman and Kevin Matlock and Souparno Ghosh and Ranadip Pal",

note = "Publisher Copyright: {\textcopyright} 2017 The Author(s).",

year = "2017",

month = dec,

day = "1",

doi = "10.1038/s41598-017-11665-4",

language = "English",

volume = "7",

journal = "Scientific reports",

issn = "2045-2322",

publisher = "Springer Science and Business Media LLC",

number = "1",

}

TY - JOUR

T1 - Heterogeneity aware random forest for drug sensitivity prediction

AU - Rahman, Raziur

AU - Matlock, Kevin

AU - Ghosh, Souparno

AU - Pal, Ranadip

PY - 2017/12/1

Y1 - 2017/12/1

N2 - Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea in the commonly used ensemble based predictive model of Random Forests, we propose Heterogeneity Aware Random Forests (HARF) that assigns weights to the trees based on the category of the sample. We treat heterogeneity as a latent class allocation problem and present a covariate free class allocation approach based on the distribution of leaf nodes of the model ensemble. Applications on CCLE and GDSC databases show that HARF outperforms traditional Random Forest when the average drug responses of cancer types are different.

AB - Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea in the commonly used ensemble based predictive model of Random Forests, we propose Heterogeneity Aware Random Forests (HARF) that assigns weights to the trees based on the category of the sample. We treat heterogeneity as a latent class allocation problem and present a covariate free class allocation approach based on the distribution of leaf nodes of the model ensemble. Applications on CCLE and GDSC databases show that HARF outperforms traditional Random Forest when the average drug responses of cancer types are different.

UR - http://www.scopus.com/inward/record.url?scp=85029367267&partnerID=8YFLogxK

U2 - 10.1038/s41598-017-11665-4

DO - 10.1038/s41598-017-11665-4

M3 - Article

C2 - 28900181

AN - SCOPUS:85029367267

SN - 2045-2322

VL - 7

JO - Scientific reports

JF - Scientific reports

IS - 1

M1 - 11347

ER -

Heterogeneity aware random forest for drug sensitivity prediction

Abstract

Access to Document

Other files and links

Fingerprint

Cite this