TY - GEN
T1 - Spatiotemporal transformation of social media geostreams
T2 - 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013
AU - Hwang, Myung Hwa
AU - Wang, Shaowen
AU - Cao, Guofeng
AU - Padmanabhan, Anand
AU - Zhang, Zhenhua
PY - 2013
Y1 - 2013
N2 - Georeferenced social media data streams (social media geostreams) are providing promising opportunities to gain new insights into spatiotemporal aspects of human interactions on cyber space and their relation with real-world activities. In particular, such opportunities are motivating public health researchers to improve the surveillance of disease epidemics by means of spatiotemporal analysis of social media geostreams. One essential requirement in achieving such geostream-based disease surveillance is to establish scalable data infrastructures capable of real-time transformation of massive geostreams into spatiotemporally organized data to which analytical methods are readily applicable. To fulfill this requirement, this study develops a data pipeline solution where multiple computational components are integrated to collect, process, and aggregate social media geostreams in near real time. As a test case, this solution focuses on one well-known social media geostream, the Twitter data stream, and one type of disease epidemics, the flu. The pipeline solution facilitates multiscale spatiotemporal analysis of flu risks by collecting geotagged tweets from the Twitter Streaming API, identifying flu-related tweets through keyword match, aggregating tweets at multiple spatial granularities in near real time, and storing tweets and the aggregate statistics in a distributed NoSQL database. Although developed for the surveillance of flu epidemics, the pipeline would serve as a general framework for building scalable data infrastructures that can support real-time spatiotemporal analysis of social media geostreams in the application domains beyond disease mapping and public health.
AB - Georeferenced social media data streams (social media geostreams) are providing promising opportunities to gain new insights into spatiotemporal aspects of human interactions on cyber space and their relation with real-world activities. In particular, such opportunities are motivating public health researchers to improve the surveillance of disease epidemics by means of spatiotemporal analysis of social media geostreams. One essential requirement in achieving such geostream-based disease surveillance is to establish scalable data infrastructures capable of real-time transformation of massive geostreams into spatiotemporally organized data to which analytical methods are readily applicable. To fulfill this requirement, this study develops a data pipeline solution where multiple computational components are integrated to collect, process, and aggregate social media geostreams in near real time. As a test case, this solution focuses on one well-known social media geostream, the Twitter data stream, and one type of disease epidemics, the flu. The pipeline solution facilitates multiscale spatiotemporal analysis of flu risks by collecting geotagged tweets from the Twitter Streaming API, identifying flu-related tweets through keyword match, aggregating tweets at multiple spatial granularities in near real time, and storing tweets and the aggregate statistics in a distributed NoSQL database. Although developed for the surveillance of flu epidemics, the pipeline would serve as a general framework for building scalable data infrastructures that can support real-time spatiotemporal analysis of social media geostreams in the application domains beyond disease mapping and public health.
KW - data pipeline
KW - disease surveillance
KW - social media geostreams
KW - spatiotemporal analysis
UR - http://www.scopus.com/inward/record.url?scp=84894631570&partnerID=8YFLogxK
U2 - 10.1145/2534303.2534310
DO - 10.1145/2534303.2534310
M3 - Conference contribution
AN - SCOPUS:84894631570
SN - 9781450325325
T3 - Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013
SP - 12
EP - 21
BT - Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013
Y2 - 5 November 2013 through 5 November 2013
ER -