Multivariate classification of disease phenotypes of esophageal adenocarcinoma by pattern recognition analysis of MALDI-TOF mass spectra of serum N-linked glycans

Barry K. Lavine, Collin G. White, Lin DeNoyer, Yehia Mechref

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The development of a novel two-step data analysis methodology to uncover signatures of potential cancer biomarkers in matrix assisted laser desorption ionization (MALDI) time of flight (TOF) mass spectra of large serum peptidomes is described. First, raw spectral data are processed using QceAlign, which exploits Bayesian and maximum entropy methods for peak identification and calibration. The raw spectral data are baseline corrected and normalized. Peak identification is based on a Bayesian second derivative of the baseline-corrected and normalized raw data, with peak S/N statistics provided by a maximum entropy smoothing function. A reference MALDI-TOF reference spectrum is created from the data and each spectrum is slid by n data points to the right or left along the x axis of the reference file. At each relative position n, the Shannon entropy of the sum of the two files is computed. Optimal alignment is associated with the shift that produces the minimum Shannon entropy. Second, a genetic algorithm (GA) for pattern recognition analysis is applied to the peak matched data. The pattern recognition GA selects features that optimize the separation of the sample classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the spectral features chosen by the pattern recognition GA convey information primarily about the differences between classes in the data. In addition, the algorithm focuses on those classes and or samples that are difficult to classify as it trains by boosting the sample and class weights. Samples that consistently classify correctly are not as heavily weighted as those samples that are difficult to classify. The pattern recognition GA integrates aspects of artificial intelligence and evolutionary computations to yield a “smart” one -pass procedure for features selection, classification, and prediction in a single step.

Original languageEnglish
Pages (from-to)83-88
Number of pages6
JournalMicrochemical Journal
Volume132
DOIs
StatePublished - May 1 2017

Keywords

  • Esophageal adenocarcinoma
  • Feature selection
  • Genetic algorithms
  • MALDI-TOF
  • Pattern recognition
  • Peak alignment
  • Serum glycans

Fingerprint

Dive into the research topics of 'Multivariate classification of disease phenotypes of esophageal adenocarcinoma by pattern recognition analysis of MALDI-TOF mass spectra of serum N-linked glycans'. Together they form a unique fingerprint.

Cite this