TY - GEN
T1 - Feature evaluation of the support vector machine for micro-RNA target site prediction in arabidopsis Thaliana
AU - Gontcharova, Viktoria
AU - Youn, Eun
AU - Richardson, Casey R.
AU - Morton, Chuck
AU - Samanta, Manoj
AU - Luo, Qingjun
AU - Rock, Chris
PY - 2008
Y1 - 2008
N2 - MicroRNAs (miRNAs) are small (21-24 nucleotide long) non-coding RNAs that contribute to post transcriptional regulation and development by Watson-Crick pairing to a target gene and antagonizing expression by mechanisms related to RNA interference. The high sequence complementarity of miRNAs to plant target genes has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms where complementarity is less conserved. We explored computationally miRNA binding potentials in Arabidopsis thaliana target genes and miRNA genes themselves as it relates to a novel phenomenon of sense and antisense transcript abundance, quantified by high resolution (25-36 base pairs) whole genome tiling microarrays, and by quantified novel small interfering RNAs (from deep sequencing of small RNA libraries) that map to the subject loci. A miRNA prediction pipeline was developed using a Support Vector Machine (SVM) based on the two biologically related features: antisense/sense transcription topology and novel small-interfering RNA abundance. These phenomena are hypothesized to be causally related to miRNA binding to the target gene transcript. A statistically significant transcriptome signal termed "ping-pong" was identified in miRNA target gene sense-antisense strand topology (downstream sense signal correlated with upstream antisense, relative to the miRNA binding site) and was used as a novel feature for miRNA target gene prediction. This feature, along with the abundance of unique small RNAs and a standard metric (thermodynamic free energy) of binding site affinity were used in a SVM to build a prediction model. The three features were incorporated and the performance of the SVM was tested against the miRNA genes themselves. The SVM predicted the "ancient" (deeply conserved) class of validated miRNA genes with an accuracy of 92%, and 75% for the available Arabidopsis-specific class of "new" rapidly-evolving miRNAs. Based on the accuracy, specificity, sensitivity and precision of the SVM prediction, the novel "ping-pong" expression feature combined with small RNA abundance and traditional thermodynamic measures may be able to identify new miRNA target sites and miRNA genes in Arabidopsis and other plant species, and potentially other kingdoms, based on deep genomic expression datasets.
AB - MicroRNAs (miRNAs) are small (21-24 nucleotide long) non-coding RNAs that contribute to post transcriptional regulation and development by Watson-Crick pairing to a target gene and antagonizing expression by mechanisms related to RNA interference. The high sequence complementarity of miRNAs to plant target genes has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms where complementarity is less conserved. We explored computationally miRNA binding potentials in Arabidopsis thaliana target genes and miRNA genes themselves as it relates to a novel phenomenon of sense and antisense transcript abundance, quantified by high resolution (25-36 base pairs) whole genome tiling microarrays, and by quantified novel small interfering RNAs (from deep sequencing of small RNA libraries) that map to the subject loci. A miRNA prediction pipeline was developed using a Support Vector Machine (SVM) based on the two biologically related features: antisense/sense transcription topology and novel small-interfering RNA abundance. These phenomena are hypothesized to be causally related to miRNA binding to the target gene transcript. A statistically significant transcriptome signal termed "ping-pong" was identified in miRNA target gene sense-antisense strand topology (downstream sense signal correlated with upstream antisense, relative to the miRNA binding site) and was used as a novel feature for miRNA target gene prediction. This feature, along with the abundance of unique small RNAs and a standard metric (thermodynamic free energy) of binding site affinity were used in a SVM to build a prediction model. The three features were incorporated and the performance of the SVM was tested against the miRNA genes themselves. The SVM predicted the "ancient" (deeply conserved) class of validated miRNA genes with an accuracy of 92%, and 75% for the available Arabidopsis-specific class of "new" rapidly-evolving miRNAs. Based on the accuracy, specificity, sensitivity and precision of the SVM prediction, the novel "ping-pong" expression feature combined with small RNA abundance and traditional thermodynamic measures may be able to identify new miRNA target sites and miRNA genes in Arabidopsis and other plant species, and potentially other kingdoms, based on deep genomic expression datasets.
KW - Antisense transcription
KW - Genomics
KW - RNA interference
KW - Transitivity
UR - http://www.scopus.com/inward/record.url?scp=84878147676&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84878147676
SN - 9781615677153
T3 - International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics 2008, BCBGC 2008
SP - 126
EP - 133
BT - International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics 2008, BCBGC 2008
T2 - 2008 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics, BCBGC 2008
Y2 - 7 July 2008 through 10 July 2008
ER -