TY - JOUR
T1 - Crisis social media data labeled for storm-related information and toponym usage
AU - Grace, Rob
N1 - Publisher Copyright:
© 2020 The Author(s)
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/6
Y1 - 2020/6
N2 - Social media provides citizens and officials with important sources of information during times of crisis. This data article makes available labeled, storm-related social media data collected over a six-hour period during a severe storm and F1 tornado that struck Central Pennsylvania on May 1st, 2017. Three datasets were collected from Twitter using location, keyword, and network filtering techniques, respectively. Only 2% of the 22,706 total tweets overlap among the datasets, providing researchers with a broader scope of information than normally available when collecting tweets using location (i.e., geotag-based) and keyword filtering alone or in combination during a crisis. Each data collection technique is described in detail, including network filtering which collects data from networks of social media users associated with a geographic area. The datasets are manually labeled for information content and toponym usage. The 22,706 tweet IDs, dehydrated for privacy, are labeled for relevance (storm-related and off-topic) and 19 types of storm-related information organized into six categories: infrastructure damage, service disruption, personal experience, weather updates, weather forecasts, and weather warnings. Data are also labeled for toponym usage (with or without toponyms), location (local, remote, and generic toponyms), and granularity (hyperlocal, municipal, and regional toponyms). The comprehensively labeled datasets provide researchers with opportunities to analyze crisis-related information behaviors and volunteered location information behaviors during a hyperlocal crisis event, as well as develop and evaluate automated filtering, geolocation, and event detection techniques that can aid citizens and crisis responders.
AB - Social media provides citizens and officials with important sources of information during times of crisis. This data article makes available labeled, storm-related social media data collected over a six-hour period during a severe storm and F1 tornado that struck Central Pennsylvania on May 1st, 2017. Three datasets were collected from Twitter using location, keyword, and network filtering techniques, respectively. Only 2% of the 22,706 total tweets overlap among the datasets, providing researchers with a broader scope of information than normally available when collecting tweets using location (i.e., geotag-based) and keyword filtering alone or in combination during a crisis. Each data collection technique is described in detail, including network filtering which collects data from networks of social media users associated with a geographic area. The datasets are manually labeled for information content and toponym usage. The 22,706 tweet IDs, dehydrated for privacy, are labeled for relevance (storm-related and off-topic) and 19 types of storm-related information organized into six categories: infrastructure damage, service disruption, personal experience, weather updates, weather forecasts, and weather warnings. Data are also labeled for toponym usage (with or without toponyms), location (local, remote, and generic toponyms), and granularity (hyperlocal, municipal, and regional toponyms). The comprehensively labeled datasets provide researchers with opportunities to analyze crisis-related information behaviors and volunteered location information behaviors during a hyperlocal crisis event, as well as develop and evaluate automated filtering, geolocation, and event detection techniques that can aid citizens and crisis responders.
KW - Crisis informatics
KW - Emergency management
KW - Information behavior
KW - Risk communication
KW - Twitter
KW - volunteered geographic information
UR - http://www.scopus.com/inward/record.url?scp=85084092694&partnerID=8YFLogxK
U2 - 10.1016/j.dib.2020.105595
DO - 10.1016/j.dib.2020.105595
M3 - Article
AN - SCOPUS:85084092694
VL - 30
JO - Data in Brief
JF - Data in Brief
SN - 2352-3409
M1 - 105595
ER -