YAKE! Collection-Independent Automatic Keyword Extractor

TitleYAKE! Collection-Independent Automatic Keyword Extractor
Publication TypeConference Paper
Year of Publication2018
AuthorsCampos R, Mangaravite V, Pasquali A, Jorge AM, Nunes C, Jatowt A
EditorPasi G, Piwowarski B, Azzopardi L, Hanbury A
Conference NameAdvances in Information Retrieval
PublisherSpringer International Publishing
Conference LocationCham
ISBN Number978-3-319-76941-7
Abstract

In this paper, we present YAKE!, a novel feature-based system for multi-lingual keyword extraction from single documents, which supports texts of different sizes, domains or languages. Unlike most systems, YAKE! does not rely on dictionaries or thesauri, neither it is trained against any corpora. Instead, we follow an unsupervised approach which builds upon features extracted from the text, making it thus applicable to documents written in many different languages without the need for external knowledge. This can be beneficial for a large number of tasks and a plethora of situations where the access to training corpora is either limited or restricted. In this demo, we offer an easy to use, interactive session, where users from both academia and industry can try our system, either by using a sample document or by introducing their own text. As an add-on, we compare our extracted keywords against the output produced by the IBM Natural Language Understanding (IBM NLU) and Rake system. YAKE! demo is available at http://bit.ly/YakeDemoECIR2018. A python implementation of YAKE! is also available at PyPi repository (https://pypi.python.org/pypi/yake/).