A preliminary investigation with twitter to augment CVD exposome research

Daniel Medina Sada, Susan Mengel, Lisaann S. Gittner, Hafiz Khan, Mario A. Pitalua Rodriguez, Ravi Vadapalli

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This project focuses on analyzing the sentiment of tweets in order to find a correspondence to health issues and to gain a new perspective in analyzing health data. Twitter social media is a huge source of information that can augment data about health in particular geographic locations. For this project, analyzing tweets is an attempt to find some relation between the sentiment of tweets and Cardiovascular Disease (CVD) in the counties along Interstate 20 (I-20) in Texas. Only geo-tagged tweets that are mapped to the counties of interest are used in the main analysis. The sentiment of the text of the Tweet is determined as being either positive or negative. Using the Natural Language Toolkit (NLTK), several classifiers are trained to determine the sentiment of the tweet. Each of the classifier's results are compared to measure the confidence of the sentiment declared. After all the tweets are classified, then the results are used to calculate the following for each county: Positive-to-Negative ratio, Positive-to-Population ratio, and Negative-to-Population ratio. This data is then separated into quintiles and compared to the Cardiovascular Disease map of I-20 in order to determine if a relationship may exist between CVD and the tweets. The preliminary results show that a correspondence exists between the low CVD rate in a county to the Positive-to-Negative ratio of that same county.

Original languageEnglish
Title of host publicationBDCAT 2017 - Proceedings of the 4th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
PublisherAssociation for Computing Machinery, Inc
Pages169-178
Number of pages10
ISBN (Electronic)9781450355490
DOIs
StatePublished - Dec 5 2017
Event4th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2017 - Austin, United States
Duration: Dec 5 2017Dec 8 2017

Publication series

NameBDCAT 2017 - Proceedings of the 4th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies

Conference

Conference4th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2017
CountryUnited States
CityAustin
Period12/5/1712/8/17

Keywords

  • Data Mining, Text Analysis
  • Healthcare
  • Sentiment Analysis
  • Topical Analysis
  • Twitter

Fingerprint Dive into the research topics of 'A preliminary investigation with twitter to augment CVD exposome research'. Together they form a unique fingerprint.

  • Cite this

    Sada, D. M., Mengel, S., Gittner, L. S., Khan, H., Pitalua Rodriguez, M. A., & Vadapalli, R. (2017). A preliminary investigation with twitter to augment CVD exposome research. In BDCAT 2017 - Proceedings of the 4th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (pp. 169-178). (BDCAT 2017 - Proceedings of the 4th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies). Association for Computing Machinery, Inc. https://doi.org/10.1145/3148055.3148074