TY - JOUR
T1 - Extending the state-space model to accommodate missing values in responses and covariates
AU - Naranjo, Arlene
AU - Trindade, A. Alexandre
AU - Casella, George
N1 - Funding Information:
Arlene Naranjo is Research Assistant Professor, Department of Biostatistics, University of Florida, Gainesville, FL 32610 (E-mail: anaranjo@cog.ufl.edu). A. Alexandre Trindade is Associate Professor, Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409 (E-mail: alex.trindade@ttu.edu). George Casella (who recently passed away) was Distinguished Professor, Department of Statistics, University of Florida, Gainesville, FL 32611. This research is based on Naranjo’s Ph.D. thesis. Also, it is supported by National Science Foundation grants DMS-0631632 and SES-0631588. The authors thank Dr. Susanna Lagorio, MD, Senior Researcher, National Centre for Epidemiology Surveillance and Health Promotion (CNESPS), National Institute of Health (Istituto Superiore di Sanità), Rome (Italy), for access to the Lagorio et al. (2006) data. We are indebted to Prof. Dr. Miguel Jerez, Univer-sidad Complutense de Madrid (Spain), for helpful discussions and access to E4, a MATLAB toolbox for time series modeling, which permitted us to carry out model identification calculations. We also acknowledge the suggestions of two anonymous reviewers that led to vast improvements. Finally, we dedicate this work to the memory of our mentor and colleague, George Casella, whose passing leaves an immense void in the statistics community.
PY - 2013
Y1 - 2013
N2 - This article proposes an extended state-space model for accommodating multivariate panel data. The novel aspect of this contribution is an adjustment to the classical model for multiple subjects that allows missingness in the covariates in addition to the responses. Missing covariate data are handled by a second state-space model nested inside the first to represent unobserved exogenous information. Relevant Kalman filter equations are derived, and explicit expressions are provided for both the E- and M-steps of an expectation-maximization (EM) algorithm, to obtain maximum (Gaussian) likelihood estimates of all model parameters. In the presence of missing data, the resulting EM algorithm becomes computationally intractable, but a simplification of the M-step leads to a new procedure that is shown to be an expectation/conditional maximization (ECM) algorithm under exogeneity of the covariates. Simulation studies reveal that the approach appears to be relatively robust to moderate percentages of missing data, even with fewer subjects and time points, and that estimates are generally consistent with the asymptotics. The methodology is applied to a dataset from a published panel study of elderly patients with impaired respiratory function. Forecasted values thus obtained may serve as an "early-warning" mechanism for identifying patients whose lung function is nearing critical levels. Supplementary materials for this article are available online.
AB - This article proposes an extended state-space model for accommodating multivariate panel data. The novel aspect of this contribution is an adjustment to the classical model for multiple subjects that allows missingness in the covariates in addition to the responses. Missing covariate data are handled by a second state-space model nested inside the first to represent unobserved exogenous information. Relevant Kalman filter equations are derived, and explicit expressions are provided for both the E- and M-steps of an expectation-maximization (EM) algorithm, to obtain maximum (Gaussian) likelihood estimates of all model parameters. In the presence of missing data, the resulting EM algorithm becomes computationally intractable, but a simplification of the M-step leads to a new procedure that is shown to be an expectation/conditional maximization (ECM) algorithm under exogeneity of the covariates. Simulation studies reveal that the approach appears to be relatively robust to moderate percentages of missing data, even with fewer subjects and time points, and that estimates are generally consistent with the asymptotics. The methodology is applied to a dataset from a published panel study of elderly patients with impaired respiratory function. Forecasted values thus obtained may serve as an "early-warning" mechanism for identifying patients whose lung function is nearing critical levels. Supplementary materials for this article are available online.
KW - EM algorithm
KW - Kalman filter
KW - Longitudinal study
KW - Panel data
KW - Transition model
UR - http://www.scopus.com/inward/record.url?scp=84878256029&partnerID=8YFLogxK
U2 - 10.1080/01621459.2012.746066
DO - 10.1080/01621459.2012.746066
M3 - Article
AN - SCOPUS:84878256029
SN - 0162-1459
VL - 108
SP - 202
EP - 216
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 501
ER -