Document-specific keyphrase candidate search and ranking

Qingren Wang, Victor S. Sheng, Xindong Wu

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


This paper proposes an approach KeyRank to extract proper keyphrases from a document in English. It first searches all keyphrase candidates from the document, and then ranks them for selecting top-N ones as final keyphrases. Existing studies show that extracting a complete keyphrase candidate set that includes semantic relations in context, and evaluating the effectiveness of each candidate are crucial to extract high quality keyphrases from documents. Based on that words do not repeatedly appear in an effective keyphrase in English, a novel keyphrase candidate search algorithm using sequential pattern mining with gap constraints (called KCSP) is proposed to extract keyphrase candidates for KeyRank. And then an effectiveness evaluation measure pattern frequency with entropy (called PF-H) is proposed for KeyRank to rank these keyphrase candidates. Our experimental results show that KeyRank has better performance. Its first component KCSP is much more efficient than a closely related approach SPMW, and its second component PF-H is an effective evaluation mechanism for ranking keyphrase candidates.1

Original languageEnglish
Pages (from-to)163-176
Number of pages14
JournalExpert Systems with Applications
StatePublished - May 1 2018


  • Entropy
  • Keyphrase candidate ranking
  • Keyphrase candidate search
  • Sequential pattern mining


Dive into the research topics of 'Document-specific keyphrase candidate search and ranking'. Together they form a unique fingerprint.

Cite this