TY - GEN
T1 - A natural language normalization approach to enhance social media text reasoning
AU - Nguyen, Long Hoang
AU - Salopek, Andrew
AU - Zhao, Liang
AU - Jin, Fang
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Social media has become a popular data source to track and analyze societal events. Targeted domains such as election, civil unrest, and spreading disease all require a natural language normalization tool capable of extracting information pertinent to these domains accurately. Due to the unstructured language, short-length messages, casual posting styles, and homonyms, it is technically difficult and labor-intensive to remove barriers that may lead to inaccurate analysis. Because the fact that typos or other symbolic representations of sentiment may lead to lower frequency of term appearance, language preprocessing becomes critical and necessary to improve social media text reasoning. We propose a novel unsupervised preprocessing approach to enhance text understanding quality and illustrate this approach using one specific domain, flu shot reasoning. The proposed approach relies on a database of synonyms and opposite words and an algorithm to transform negative sentences into its affirmative form. In this form, the features and opinions are reflected accurately via transforming parts of speech. For instance, features are presented as nouns and opinions are presented as verbs or adjectives. The algorithm also corrects words if they are not correctly written and normalizes them to increase its frequency of appearance. The effectiveness of our algorithm is evaluated on the tweets dataset to answer why people are reluctant to take flu shots.
AB - Social media has become a popular data source to track and analyze societal events. Targeted domains such as election, civil unrest, and spreading disease all require a natural language normalization tool capable of extracting information pertinent to these domains accurately. Due to the unstructured language, short-length messages, casual posting styles, and homonyms, it is technically difficult and labor-intensive to remove barriers that may lead to inaccurate analysis. Because the fact that typos or other symbolic representations of sentiment may lead to lower frequency of term appearance, language preprocessing becomes critical and necessary to improve social media text reasoning. We propose a novel unsupervised preprocessing approach to enhance text understanding quality and illustrate this approach using one specific domain, flu shot reasoning. The proposed approach relies on a database of synonyms and opposite words and an algorithm to transform negative sentences into its affirmative form. In this form, the features and opinions are reflected accurately via transforming parts of speech. For instance, features are presented as nouns and opinions are presented as verbs or adjectives. The algorithm also corrects words if they are not correctly written and normalizes them to increase its frequency of appearance. The effectiveness of our algorithm is evaluated on the tweets dataset to answer why people are reluctant to take flu shots.
KW - Information Retrieval
KW - Language Preprocessing
KW - Sentiment Analysis
KW - Social Media Reasoning
UR - http://www.scopus.com/inward/record.url?scp=85047830750&partnerID=8YFLogxK
U2 - 10.1109/BigData.2017.8258148
DO - 10.1109/BigData.2017.8258148
M3 - Conference contribution
T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
SP - 2019
EP - 2026
BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
A2 - Nie, Jian-Yun
A2 - Obradovic, Zoran
A2 - Suzumura, Toyotaro
A2 - Ghosh, Rumi
A2 - Nambiar, Raghunath
A2 - Wang, Chonggang
A2 - Zang, Hui
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
A2 - Kepner, Jeremy
A2 - Cuzzocrea, Alfredo
A2 - Tang, Jian
A2 - Toyoda, Masashi
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Big Data, Big Data 2017
Y2 - 11 December 2017 through 14 December 2017
ER -