TY - JOUR
T1 - Analyzing genomic data using tensor-based orthogonal polynomials with application to synthetic RNAs
AU - Nafees, Saba
AU - Rice, Sean H.
AU - Wakeman, Catherine A.
N1 - Publisher Copyright:
© The Author(s) 2020. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - An important goal in molecular biology is to quantify both the patterns across a genomic sequence and the relationship between phenotype and underlying sequence. We propose a multivariate tensor-based orthogonal polynomial approach to characterize nucleotides or amino acids in a given sequence and map corresponding phenotypes onto the sequence space. We have applied this method to a previously published case of small transcription activating RNAs. Covariance patterns along the sequence showcased strong correlations between nucleotides at the ends of the sequence. However, when the phenotype is projected onto the sequence space, this pattern does not emerge. When doing second order analysis and quantifying the functional relationship between the phenotype and pairs of sites along the sequence, we identified sites with high regressions spread across the sequence, indicating potential intramolecular binding. In addition to quantifying interactions between different parts of a sequence, the method quantifies sequence-phenotype interactions at first and higher order levels. We discuss the strengths and constraints of the method and compare it to computational methods such as machine learning approaches. An accompanying command line tool to compute these polynomials is provided. We show proof of concept of this approach and demonstrate its potential application to other biological systems.
AB - An important goal in molecular biology is to quantify both the patterns across a genomic sequence and the relationship between phenotype and underlying sequence. We propose a multivariate tensor-based orthogonal polynomial approach to characterize nucleotides or amino acids in a given sequence and map corresponding phenotypes onto the sequence space. We have applied this method to a previously published case of small transcription activating RNAs. Covariance patterns along the sequence showcased strong correlations between nucleotides at the ends of the sequence. However, when the phenotype is projected onto the sequence space, this pattern does not emerge. When doing second order analysis and quantifying the functional relationship between the phenotype and pairs of sites along the sequence, we identified sites with high regressions spread across the sequence, indicating potential intramolecular binding. In addition to quantifying interactions between different parts of a sequence, the method quantifies sequence-phenotype interactions at first and higher order levels. We discuss the strengths and constraints of the method and compare it to computational methods such as machine learning approaches. An accompanying command line tool to compute these polynomials is provided. We show proof of concept of this approach and demonstrate its potential application to other biological systems.
UR - http://www.scopus.com/inward/record.url?scp=85123224806&partnerID=8YFLogxK
U2 - 10.1093/nargab/lqaa101
DO - 10.1093/nargab/lqaa101
M3 - Article
AN - SCOPUS:85123224806
VL - 2
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
SN - 2631-9268
IS - 4
M1 - lqaa101
ER -