Sent2Vec: A New Sentence Embedding Representation with Sentimental Semantic

Mahdi Naser Moghadasi, Yu Zhuang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Text classification is considered as one of the primary task in many Natural Language Processing (NLP) applications. In industrial applications of NLP, sentimental analysis is a task to understand how satisfied a user is after receiving a service or buying a product. The traditional approach is to convert a text into a format of numeric vector before feeding into machine learning algorithm. This representation of a word refers to word embedding. However the traditional embedding methods often model the syntactic context of words but ignore the sentiment information of text [1]. This can impact on the accuracy of a classification model to predict the correct sentimental score for a text. In this paper, we present Sent2Vec, an alternative embedding representation that includes the sentimental semantic of a sentence in its embedding vector. We utilized the unsupervised Smoothed Inverse Frequency (uSIF) sentence embedding method in the Sent2Vec neural network over a multi million samples dataset. The new sentence embedding presented, can be used as features in downstream (un)supervised tasks, which also leads to better or comparable results compared to sophisticated methods. Furthermore, with a simple logistic regression classifier, Sent2Vec reaches competitive performance to state-of-the-art results on several datasets when combined with GloVe(6B).

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
EditorsXintao Wu, Chris Jermaine, Li Xiong, Xiaohua Tony Hu, Olivera Kotevska, Siyuan Lu, Weijia Xu, Srinivas Aluru, Chengxiang Zhai, Eyhab Al-Masri, Zhiyuan Chen, Jeff Saltz
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4672-4680
Number of pages9
ISBN (Electronic)9781728162515
DOIs
StatePublished - Dec 10 2020
Event8th IEEE International Conference on Big Data, Big Data 2020 - Virtual, Atlanta, United States
Duration: Dec 10 2020Dec 13 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020

Conference

Conference8th IEEE International Conference on Big Data, Big Data 2020
Country/TerritoryUnited States
CityVirtual, Atlanta
Period12/10/2012/13/20

Keywords

  • NLP
  • Sentimental Analysis
  • Word Embedding

Fingerprint

Dive into the research topics of 'Sent2Vec: A New Sentence Embedding Representation with Sentimental Semantic'. Together they form a unique fingerprint.

Cite this