Factor identification and prediction for teen driver crash severity using machine learning: A case study

Ciyun Lin, Dayong Wu, Hongchao Liu, Xueting Xia, Nischal Bhattarai

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


Crashes among young and inexperienced drives are a major safety problem in the United States, especially in an area with large rural road networks, such as West Texas. Rural roads present many unique safety concerns that are not fully explored. This study presents a complete machine leaning pipeline to find the patterns of crashes involved with teen drivers no older than 20 on rural roads inWest Texas, identify factors that affect injury levels, and build four machine learning predictive models on crash severity. The analysis indicates that the major causes of teen driver crashes in West Texas are teen drivers who failed to control speed or travel at an unsafe speed when they merged from rural roads to highways or approached intersections. They also failed to yield on the undivided roads with four or more lanes, leading to serious injuries. Road class, speed limit, and the first harmful event are the top three factors affecting crash severity. The predictive machine learning model, based on Label Encoder and XGBoost, seems the best option when considering both accuracy and computational cost. The results of this work should be useful to improve rural teen driver traffic safety inWest Texas and other rural areas with similar issues.

Original languageEnglish
Article number1675
JournalApplied Sciences (Switzerland)
Issue number5
StatePublished - Mar 1 2020


  • Crash severity
  • Machine learning
  • Rural roads
  • Teen driver


Dive into the research topics of 'Factor identification and prediction for teen driver crash severity using machine learning: A case study'. Together they form a unique fingerprint.

Cite this