A statistical framework of data fusion for spatial prediction of categorical variables

Guofeng Cao, Eun Hye Yoo, Shaowen Wang

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


With rapid advances of geospatial technologies, the amount of spatial data has been increasing exponentially over the past few decades. Usually collected by diverse source providers, the available spatial data tend to be fragmented by a large variety of data heterogeneities, which highlights the need of sound methods capable of efficiently fusing the diverse and incompatible spatial information. Within the context of spatial prediction of categorical variables, this paper describes a statistical framework for integrating and drawing inferences from a collection of spatially correlated variables while accounting for data heterogeneities and complex spatial dependencies. In this framework, we discuss the spatial prediction of categorical variables in the paradigm of latent random fields, and represent each spatial variable via spatial covariance functions, which define two-point similarities or dependencies of spatially correlated variables. The representation of spatial covariance functions derived from different spatial variables is independent of heterogeneous characteristics and can be combined in a straightforward fashion. Therefore it provides a unified and flexible representation of heterogeneous spatial variables in spatial analysis while accounting for complex spatial dependencies. We show that in the spatial prediction of categorical variables, the sought-after class occurrence probability at a target location can be formulated as a multinomial logistic function of spatial covariances of spatial variables between the target and sampled locations. Group least absolute shrinkage and selection operator is adopted for parameter estimation, which prevents the model from over-fitting, and simultaneously selects an optimal subset of important information (variables). Synthetic and real case studies are provided to illustrate the introduced concepts, and showcase the advantages of the proposed statistical framework.

Original languageEnglish
Pages (from-to)1785-1799
Number of pages15
JournalStochastic Environmental Research and Risk Assessment
Issue number7
StatePublished - Oct 2014


  • Categorical data
  • Data fusion
  • Geostatistics
  • Kernel methods


Dive into the research topics of 'A statistical framework of data fusion for spatial prediction of categorical variables'. Together they form a unique fingerprint.

Cite this