TY - JOUR
T1 - Document-specific keyphrase candidate search and ranking
AU - Wang, Qingren
AU - Sheng, Victor S.
AU - Wu, Xindong
N1 - Funding Information:
This research has been supported by the National Key Research and Development Program of China 2016YFB1000901, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China IRT17R32, the National Natural Science Foundation of China 61728205, 91746209, 61673152 and 61503116, and the US National Science Foundation IIS-1115417 and IIS-1613950.
Publisher Copyright:
© 2017 Elsevier Ltd
PY - 2018/5/1
Y1 - 2018/5/1
N2 - This paper proposes an approach KeyRank to extract proper keyphrases from a document in English. It first searches all keyphrase candidates from the document, and then ranks them for selecting top-N ones as final keyphrases. Existing studies show that extracting a complete keyphrase candidate set that includes semantic relations in context, and evaluating the effectiveness of each candidate are crucial to extract high quality keyphrases from documents. Based on that words do not repeatedly appear in an effective keyphrase in English, a novel keyphrase candidate search algorithm using sequential pattern mining with gap constraints (called KCSP) is proposed to extract keyphrase candidates for KeyRank. And then an effectiveness evaluation measure pattern frequency with entropy (called PF-H) is proposed for KeyRank to rank these keyphrase candidates. Our experimental results show that KeyRank has better performance. Its first component KCSP is much more efficient than a closely related approach SPMW, and its second component PF-H is an effective evaluation mechanism for ranking keyphrase candidates.1
AB - This paper proposes an approach KeyRank to extract proper keyphrases from a document in English. It first searches all keyphrase candidates from the document, and then ranks them for selecting top-N ones as final keyphrases. Existing studies show that extracting a complete keyphrase candidate set that includes semantic relations in context, and evaluating the effectiveness of each candidate are crucial to extract high quality keyphrases from documents. Based on that words do not repeatedly appear in an effective keyphrase in English, a novel keyphrase candidate search algorithm using sequential pattern mining with gap constraints (called KCSP) is proposed to extract keyphrase candidates for KeyRank. And then an effectiveness evaluation measure pattern frequency with entropy (called PF-H) is proposed for KeyRank to rank these keyphrase candidates. Our experimental results show that KeyRank has better performance. Its first component KCSP is much more efficient than a closely related approach SPMW, and its second component PF-H is an effective evaluation mechanism for ranking keyphrase candidates.1
KW - Entropy
KW - Keyphrase candidate ranking
KW - Keyphrase candidate search
KW - Sequential pattern mining
UR - http://www.scopus.com/inward/record.url?scp=85038859820&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2017.12.031
DO - 10.1016/j.eswa.2017.12.031
M3 - Article
AN - SCOPUS:85038859820
VL - 97
SP - 163
EP - 176
JO - Expert Systems with Applications
JF - Expert Systems with Applications
SN - 0957-4174
ER -