TY - JOUR
T1 - A statistical framework of data fusion for spatial prediction of categorical variables
AU - Cao, Guofeng
AU - Yoo, Eun Hye
AU - Wang, Shaowen
N1 - Funding Information:
We gratefully acknowledge the funding provided by the National Science Foundation under grant number OCI-1047916 to support this research. We would like to thank Professors Bruce W. Hoagland and Todd D. Fagin from the University of Oklahoma for valuable discussions and the datasets they kindly provided. We would also thank the anonymous reviewers for the constructive comments and suggestions, and thank Professor Jeff Lee from Texas Tech University for his proofreading which has profoundly improved the composition of this manuscript.
Publisher Copyright:
© 2013, Springer-Verlag Berlin Heidelberg.
PY - 2014/10
Y1 - 2014/10
N2 - With rapid advances of geospatial technologies, the amount of spatial data has been increasing exponentially over the past few decades. Usually collected by diverse source providers, the available spatial data tend to be fragmented by a large variety of data heterogeneities, which highlights the need of sound methods capable of efficiently fusing the diverse and incompatible spatial information. Within the context of spatial prediction of categorical variables, this paper describes a statistical framework for integrating and drawing inferences from a collection of spatially correlated variables while accounting for data heterogeneities and complex spatial dependencies. In this framework, we discuss the spatial prediction of categorical variables in the paradigm of latent random fields, and represent each spatial variable via spatial covariance functions, which define two-point similarities or dependencies of spatially correlated variables. The representation of spatial covariance functions derived from different spatial variables is independent of heterogeneous characteristics and can be combined in a straightforward fashion. Therefore it provides a unified and flexible representation of heterogeneous spatial variables in spatial analysis while accounting for complex spatial dependencies. We show that in the spatial prediction of categorical variables, the sought-after class occurrence probability at a target location can be formulated as a multinomial logistic function of spatial covariances of spatial variables between the target and sampled locations. Group least absolute shrinkage and selection operator is adopted for parameter estimation, which prevents the model from over-fitting, and simultaneously selects an optimal subset of important information (variables). Synthetic and real case studies are provided to illustrate the introduced concepts, and showcase the advantages of the proposed statistical framework.
AB - With rapid advances of geospatial technologies, the amount of spatial data has been increasing exponentially over the past few decades. Usually collected by diverse source providers, the available spatial data tend to be fragmented by a large variety of data heterogeneities, which highlights the need of sound methods capable of efficiently fusing the diverse and incompatible spatial information. Within the context of spatial prediction of categorical variables, this paper describes a statistical framework for integrating and drawing inferences from a collection of spatially correlated variables while accounting for data heterogeneities and complex spatial dependencies. In this framework, we discuss the spatial prediction of categorical variables in the paradigm of latent random fields, and represent each spatial variable via spatial covariance functions, which define two-point similarities or dependencies of spatially correlated variables. The representation of spatial covariance functions derived from different spatial variables is independent of heterogeneous characteristics and can be combined in a straightforward fashion. Therefore it provides a unified and flexible representation of heterogeneous spatial variables in spatial analysis while accounting for complex spatial dependencies. We show that in the spatial prediction of categorical variables, the sought-after class occurrence probability at a target location can be formulated as a multinomial logistic function of spatial covariances of spatial variables between the target and sampled locations. Group least absolute shrinkage and selection operator is adopted for parameter estimation, which prevents the model from over-fitting, and simultaneously selects an optimal subset of important information (variables). Synthetic and real case studies are provided to illustrate the introduced concepts, and showcase the advantages of the proposed statistical framework.
KW - Categorical data
KW - Data fusion
KW - Geostatistics
KW - Kernel methods
KW - LASSO
UR - http://www.scopus.com/inward/record.url?scp=84920254929&partnerID=8YFLogxK
U2 - 10.1007/s00477-013-0842-7
DO - 10.1007/s00477-013-0842-7
M3 - Article
AN - SCOPUS:84920254929
SN - 1436-3240
VL - 28
SP - 1785
EP - 1799
JO - Stochastic Environmental Research and Risk Assessment
JF - Stochastic Environmental Research and Risk Assessment
IS - 7
ER -