TY - JOUR
T1 - Crisis social media data labeled for storm-related information and toponym usage
AU - Grace, Rob
N1 - Funding Information:
This research was supported by the National Science Foundation under grant Nos. 541155 and No. 741370. The author declares that no known competing financial interests or personal relationships have, or could be perceived to have, influenced the work reported in this article.
Publisher Copyright:
© 2020 The Author(s)
PY - 2020/6
Y1 - 2020/6
N2 - Social media provides citizens and officials with important sources of information during times of crisis. This data article makes available labeled, storm-related social media data collected over a six-hour period during a severe storm and F1 tornado that struck Central Pennsylvania on May 1st, 2017. Three datasets were collected from Twitter using location, keyword, and network filtering techniques, respectively. Only 2% of the 22,706 total tweets overlap among the datasets, providing researchers with a broader scope of information than normally available when collecting tweets using location (i.e., geotag-based) and keyword filtering alone or in combination during a crisis. Each data collection technique is described in detail, including network filtering which collects data from networks of social media users associated with a geographic area. The datasets are manually labeled for information content and toponym usage. The 22,706 tweet IDs, dehydrated for privacy, are labeled for relevance (storm-related and off-topic) and 19 types of storm-related information organized into six categories: infrastructure damage, service disruption, personal experience, weather updates, weather forecasts, and weather warnings. Data are also labeled for toponym usage (with or without toponyms), location (local, remote, and generic toponyms), and granularity (hyperlocal, municipal, and regional toponyms). The comprehensively labeled datasets provide researchers with opportunities to analyze crisis-related information behaviors and volunteered location information behaviors during a hyperlocal crisis event, as well as develop and evaluate automated filtering, geolocation, and event detection techniques that can aid citizens and crisis responders.
AB - Social media provides citizens and officials with important sources of information during times of crisis. This data article makes available labeled, storm-related social media data collected over a six-hour period during a severe storm and F1 tornado that struck Central Pennsylvania on May 1st, 2017. Three datasets were collected from Twitter using location, keyword, and network filtering techniques, respectively. Only 2% of the 22,706 total tweets overlap among the datasets, providing researchers with a broader scope of information than normally available when collecting tweets using location (i.e., geotag-based) and keyword filtering alone or in combination during a crisis. Each data collection technique is described in detail, including network filtering which collects data from networks of social media users associated with a geographic area. The datasets are manually labeled for information content and toponym usage. The 22,706 tweet IDs, dehydrated for privacy, are labeled for relevance (storm-related and off-topic) and 19 types of storm-related information organized into six categories: infrastructure damage, service disruption, personal experience, weather updates, weather forecasts, and weather warnings. Data are also labeled for toponym usage (with or without toponyms), location (local, remote, and generic toponyms), and granularity (hyperlocal, municipal, and regional toponyms). The comprehensively labeled datasets provide researchers with opportunities to analyze crisis-related information behaviors and volunteered location information behaviors during a hyperlocal crisis event, as well as develop and evaluate automated filtering, geolocation, and event detection techniques that can aid citizens and crisis responders.
KW - Crisis informatics
KW - Emergency management
KW - Information behavior
KW - Risk communication
KW - Twitter
KW - volunteered geographic information
UR - http://www.scopus.com/inward/record.url?scp=85084092694&partnerID=8YFLogxK
U2 - 10.1016/j.dib.2020.105595
DO - 10.1016/j.dib.2020.105595
M3 - Article
AN - SCOPUS:85084092694
SN - 2352-3409
VL - 30
JO - Data in Brief
JF - Data in Brief
M1 - 105595
ER -