Feature evaluation of the support vector machine for micro-RNA target site prediction in arabidopsis Thaliana

Viktoria Gontcharova, Eun Youn, Casey R. Richardson, Chuck Morton, Manoj Samanta, Qingjun Luo, Chris Rock

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

MicroRNAs (miRNAs) are small (21-24 nucleotide long) non-coding RNAs that contribute to post transcriptional regulation and development by Watson-Crick pairing to a target gene and antagonizing expression by mechanisms related to RNA interference. The high sequence complementarity of miRNAs to plant target genes has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms where complementarity is less conserved. We explored computationally miRNA binding potentials in Arabidopsis thaliana target genes and miRNA genes themselves as it relates to a novel phenomenon of sense and antisense transcript abundance, quantified by high resolution (25-36 base pairs) whole genome tiling microarrays, and by quantified novel small interfering RNAs (from deep sequencing of small RNA libraries) that map to the subject loci. A miRNA prediction pipeline was developed using a Support Vector Machine (SVM) based on the two biologically related features: antisense/sense transcription topology and novel small-interfering RNA abundance. These phenomena are hypothesized to be causally related to miRNA binding to the target gene transcript. A statistically significant transcriptome signal termed "ping-pong" was identified in miRNA target gene sense-antisense strand topology (downstream sense signal correlated with upstream antisense, relative to the miRNA binding site) and was used as a novel feature for miRNA target gene prediction. This feature, along with the abundance of unique small RNAs and a standard metric (thermodynamic free energy) of binding site affinity were used in a SVM to build a prediction model. The three features were incorporated and the performance of the SVM was tested against the miRNA genes themselves. The SVM predicted the "ancient" (deeply conserved) class of validated miRNA genes with an accuracy of 92%, and 75% for the available Arabidopsis-specific class of "new" rapidly-evolving miRNAs. Based on the accuracy, specificity, sensitivity and precision of the SVM prediction, the novel "ping-pong" expression feature combined with small RNA abundance and traditional thermodynamic measures may be able to identify new miRNA target sites and miRNA genes in Arabidopsis and other plant species, and potentially other kingdoms, based on deep genomic expression datasets.

Original languageEnglish
Title of host publicationInternational Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics 2008, BCBGC 2008
Pages126-133
Number of pages8
StatePublished - 2008
Event2008 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics, BCBGC 2008 - Orlando, FL, United States
Duration: Jul 7 2008Jul 10 2008

Publication series

NameInternational Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics 2008, BCBGC 2008

Conference

Conference2008 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics, BCBGC 2008
CountryUnited States
CityOrlando, FL
Period07/7/0807/10/08

Keywords

  • Antisense transcription
  • Genomics
  • RNA interference
  • Transitivity

Fingerprint Dive into the research topics of 'Feature evaluation of the support vector machine for micro-RNA target site prediction in arabidopsis Thaliana'. Together they form a unique fingerprint.

Cite this