TY - JOUR
T1 - Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis
AU - Mannering, Fred
AU - Bhat, Chandra R.
AU - Shankar, Venky
AU - Abdel-Aty, Mohamed
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/3
Y1 - 2020/3
N2 - The analysis of highway accident data is largely dominated by traditional statistical methods (standard regression-based approaches), advanced statistical methods (such as models that account for unobserved heterogeneity), and data-driven methods (artificial intelligence, neural networks, machine learning, and so on). These methods have been applied mostly using data from observed crashes, but this can create a problem in uncovering causality since individuals that are inherently riskier than the population as a whole may be over-represented in the data. In addition, when and where individuals choose to drive could affect data analyses that use real-time data since the population of observed drivers could change over time. This issue, the nature of the data, and the implementation target of the analysis imply that analysts must often tradeoff the predictive capability of the resulting analysis and its ability to uncover the underlying causal nature of crash-contributing factors. The selection of the data-analysis method is often made without full consideration of this tradeoff, even though there are potentially important implications for the development of safety countermeasures and policies. This paper provides a discussion of the issues involved in this tradeoff with regard to specific methodological alternatives and presents researchers with a better understanding of the trade-offs often being inherently made in their analysis.
AB - The analysis of highway accident data is largely dominated by traditional statistical methods (standard regression-based approaches), advanced statistical methods (such as models that account for unobserved heterogeneity), and data-driven methods (artificial intelligence, neural networks, machine learning, and so on). These methods have been applied mostly using data from observed crashes, but this can create a problem in uncovering causality since individuals that are inherently riskier than the population as a whole may be over-represented in the data. In addition, when and where individuals choose to drive could affect data analyses that use real-time data since the population of observed drivers could change over time. This issue, the nature of the data, and the implementation target of the analysis imply that analysts must often tradeoff the predictive capability of the resulting analysis and its ability to uncover the underlying causal nature of crash-contributing factors. The selection of the data-analysis method is often made without full consideration of this tradeoff, even though there are potentially important implications for the development of safety countermeasures and policies. This paper provides a discussion of the issues involved in this tradeoff with regard to specific methodological alternatives and presents researchers with a better understanding of the trade-offs often being inherently made in their analysis.
KW - Accident likelihood
KW - Accident severity
KW - Endogeneity
KW - Highway safety
KW - Self-selectivity
UR - http://www.scopus.com/inward/record.url?scp=85078666924&partnerID=8YFLogxK
U2 - 10.1016/j.amar.2020.100113
DO - 10.1016/j.amar.2020.100113
M3 - Article
AN - SCOPUS:85078666924
SN - 2213-6657
VL - 25
JO - Analytic Methods in Accident Research
JF - Analytic Methods in Accident Research
M1 - 100113
ER -