Underreporting in traffic accident data, bias in parameters and the structure of injury severity models

Toshiyuki Yamamoto, Junpei Hashiji, Venkataraman N. Shankar

Research output: Contribution to journalArticlepeer-review

134 Scopus citations


Injury severities in traffic accidents are usually recorded on ordinal scales, and statistical models have been applied to investigate the effects of driver factors, vehicle characteristics, road geometrics and environmental conditions on injury severity. The unknown parameters in the models are in general estimated assuming random sampling from the population. Traffic accident data however suffer from underreporting effects, especially for lower injury severities. As a result, traffic accident data can be regarded as outcome-based samples with unknown population shares of the injury severities. An outcome-based sample is overrepresented by accidents of higher severities. As a result, outcome-based samples result in biased parameters which skew our inferences on the effect of key safety variables such as safety belt usage. The pseudo-likelihood function for the case with unknown population shares, which is the same as the conditional maximum likelihood for the case with known population shares, is applied in this study to examine the effects of severity underreporting on the parameter estimates. Sequential binary probit models and ordered-response probit models of injury severity are developed and compared in this study. Sequential binary probit models assume that the factors determining the severity change according to the level of the severity itself, while ordered-response probit models assume that the same factors correlate across all levels of severity. Estimation results suggest that the sequential binary probit models outperform the ordered-response probit models, and that the coefficient estimates for lap and shoulder belt use are biased if underreporting is not considered. Mean parameter bias due to underreporting can be significant. The findings show that underreporting on the outcome dimension may induce bias in inferences on a variety of factors. In particular, if underreporting is not accounted for, the marginal impacts of a variety of factors appear to be overestimated. Fixed objects and environmental conditions are overestimated in their impact on injury severity, as is the effect of separate lap and shoulder belt use. Combined lap and shoulder belt usage appears to be unaffected. The parameter bias is most pronounced when underreporting of possible injury accidents in addition to property damage only accidents is taken into account.

Original languageEnglish
Pages (from-to)1320-1329
Number of pages10
JournalAccident Analysis and Prevention
Issue number4
StatePublished - Jul 2008


  • Injury severity
  • Ordered-response probit model
  • Sequential probit model
  • Underreporting


Dive into the research topics of 'Underreporting in traffic accident data, bias in parameters and the structure of injury severity models'. Together they form a unique fingerprint.

Cite this