Extracting Unambiguous Keywords from Microposts Using Web and Query Logs Data

Davi Reis

Felipe Goldstein

Frederico Quintao

Making sense of Microposts (at WWW 2012)

Download Google Scholar

Abstract

In the recent years, a new form of content type has become ubiquitous in the web. These are small and noisy text snippets, created by users of social networks such as Twitter and Facebook. The full interpretation of those microposts by machines impose tremendous challenges, since they strongly rely on context. In this paper we propose a task which is much simpler than full interpretation of microposts: we aim to build classiﬁcation systems to detect keywords that unambiguously refer to a single dominant concept, even when taken out of context. For example, in the context of this task, apple would be classiﬁed as ambiguous whereas microsoft would not. The contribution of this work is twofold. First, we formalize this novel classiﬁcation task that can be directly applied for extracting information from microposts. Second, we show how high precision classiﬁers for this problem can be built out of Web data and search engine logs, combining traditional information retrieval metrics, such as inverted document frequency, and new ones derived from search query logs. Finally, we have proposed and evaluated relevant applications for these classiﬁers, which were able to meet precision ≥ 72% and recall ≥ 56% on unambiguous keyword extraction from microposts. We also compare those results with closely related systems, none of which could outperform those numbers.

Research Areas

Data Mining and Modeling

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Extracting Unambiguous Keywords from Microposts Using Web and Query Logs Data

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Extracting Unambiguous Keywords from Microposts Using Web and Query Logs Data

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities