A sparse latent regression approach for integrative analysis of glycomic and glycotranscriptomic data

Xuefu Wang, Sujun Li, Wenjing Peng, Yehia Mechref, Haixu Tang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Glycomics and glycotranscitomics have emerged as two key highthroughput approaches to interrogating the glycome within specific cells, tissues or organisms under specific conditions. Because the glycotransciptomic analysis utilizes the same experimental protocol as the whole-transcriptome sequencing (RNA-seq) that is commonly used in the genomic research, the glycotranscriptomic information can be conveniently extracted in silico for many biological samples from which RNA-seq data have been collected and made publicly available through large-scale projects such as The Cancer Genome Atlas (TCGA) proeject. However, the glycomic data collection is constrained by specialized analytical tools that are less accessible by biological researchers. In this paper, we present a Bayesian sparse latent regression (BSLR) model for predicting quantitative glycan abundances from glycotranscriptomic data. The model is built using the matched glycomic and glycotranscriptomic data collected in a same set of samples as training sets, and is then exploited to study the common properties of the training samples and to predict these properties (e.g., the glycan abundances) in similar samples from which only glycotranscriptomc data are available. The BSLR model assumes the glycomic and the glycotranscriptomic abundances are both modulated by a small number of independent latent variables, and thus can be constructed by using only a relatively small number of training samples. When tested on simulated data, we show our approach achieves satisfactory performance using only 10-20 training samples. We also tested our model on five cancer cell lines, and showed the BSLR model can accurately predict the glycan abundances from the transcription levels of glycan synthetic genes. Furthermore, the predicted glycan abundances can distinguish the metastatic cell line specifically targeting brain from the remaining breast cancer cell lines as well as the a brain cancer cell line, with only slightly lower power than the observed glycan abundances in glycomic experiments, indicating the BSLR prediction retains the variations of glycan abundances across different groups of samples from their glycotranscriptomic data.

Original languageEnglish
Title of host publicationACM-BCB 2017 - Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
Pages273-278
Number of pages6
ISBN (Electronic)9781450347228
DOIs
StatePublished - Aug 20 2017
Event8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2017 - Boston, United States
Duration: Aug 20 2017Aug 23 2017

Publication series

NameACM-BCB 2017 - Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Conference

Conference8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2017
CountryUnited States
CityBoston
Period08/20/1708/23/17

Keywords

  • Bayesian model
  • Biomarker discovery
  • Glycomics
  • Mcmc sampling
  • Sparse latent factor model
  • Transcriptomics

Fingerprint Dive into the research topics of 'A sparse latent regression approach for integrative analysis of glycomic and glycotranscriptomic data'. Together they form a unique fingerprint.

  • Cite this

    Wang, X., Li, S., Peng, W., Mechref, Y., & Tang, H. (2017). A sparse latent regression approach for integrative analysis of glycomic and glycotranscriptomic data. In ACM-BCB 2017 - Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 273-278). (ACM-BCB 2017 - Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics). Association for Computing Machinery, Inc. https://doi.org/10.1145/3107411.3107468