In gene networks, it is possible that the patterns of gene co-expression may exist only in a subset of the sample. In studies of relationships between genotypes and expressions of genes over multiple tissues, there may be associations in some tissues but not in the others. Despite the importance of the problem in genomic applications, it is challenging to identify relationships between two variables when the correlation may only exist in a subset of the sample. The situation becomes even less tractable when there exist two subsets in which correlations are in opposite directions. By ranking subset relationships according to Kendall’s tau, a tau-path can be derived to facilitate the identification of correlated subsets, if such subsets exist. However, the current tau-path methodology only considers the situation in which there is association in a subsample; the more complex scenario depicting the existence of two subsets with opposite directionality of associations was not addressed. Further, existing algorithms for finding tau-paths may be suboptimal given their greedy nature. In this paper, we extend the tau-path methodology to accommodate the situation in which the sample may be drawn from a heterogeneous population composed of subpopulations portraying positive and negative associations. We also propose the use of a cross entropy Monte Carlo procedure to obtain an optimal tau-path, CEMCt p. The algorithm not only can provide simultaneous detection of positive and negative correlations in the same sample, but also can lead to the identification of subsamples that provide evidence for the detected associations. An extensive simulation study shows the aptness of CEMCt p for detecting associations under various scenarios. Compared with two standard tests for detecting associations, CEMCt p is seen to be more powerful when there are indeed complex subset associations with well-controlled type-I error rates. We applied CEMCt p to the NCI-60 gene expression data to illustrate its utility for uncovering network relationships that were missed with standard methods.
- Cross entropy Monte Carlo (CEMC)
- Gene networks
- Heterogeneous sample
- Subset associations