Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

24
Enhanced Reading Experience by Augmenting Text Using Linked Open Data Ikuya Yamada 1,2,3 Tomotaka Ito 1,2 Shinnosuke Usami 1,2 Tomoya Toyoda 1,2 Shinsuke Takagi 1,2 Hideaki Takeda 3 Yoshiyasu Takefuji 2 1 Studio Ousia 2 Keio University 3 National Institute of Informatics

description

Presented in Semantic Web Challenge @ International Semantic Web Conference 2014 http://challenge.semanticweb.org/2014/finalists.html

Transcript of Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

Page 1: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

Enhanced Reading Experience by Augmenting Text Using Linked Open Data

Ikuya Yamada1,2,3 Tomotaka Ito1,2 Shinnosuke Usami1,2 Tomoya Toyoda1,2

Shinsuke Takagi1,2 Hideaki Takeda3 Yoshiyasu Takefuji2

1Studio Ousia 2Keio University 3National Institute of Informatics

Page 2: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Background

‣ We frequently encounter unfamiliar entities while reading text

‣ Tedious steps are required to retrieve detailed information about an entity✦ Select, copy, and paste text into the search box...✦ Especially problematic on recent touchscreen devices

2

Kyary Pamyu Pamyu is a Japanese model and singer. Her public image is associated with Japan's kawaisa culture

centered in the Harajuku, Tokyo.

Page 3: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Linkify

‣ Linkify automatically converts entity names into links

3

Kyary Pamyu Pamyu is a Japanese model and singer. Her public image is associated with Japan's kawaisa culture

centered in the Harajuku, Tokyo.

Page 4: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Linkify

‣ Linkify automatically converts entity names into links

‣ When a user selects a link, a popup window containing the summarized information of the entity is displayed

4

Kyary Pamyu Pamyu is a Japanese model and singer. Her public image is associated with Japan's kawaisa culture

centered in the Harajuku, Tokyo.

Page 5: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Outline‣ Converting Entity Names into Links

✦ Entity Linking

✦ Evaluating Helpfulness of Entities

- Machine-learning features

- Dataset

- Results

‣ Rendering the Widget

‣ Demonstration

5

Page 6: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Converting Entity Names into Links

1. DOM Iterator sends the page text to the server when the web browser renders a web page

2. The server-side system extracts entity names from the text and sends back to the browser

3. Link converter converts the extracted entity names into links

6

Page 7: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Converting Entity Names into Links

1. Entity Linker extracts entity names and disambiguates them to entries in Wikipedia

2. Entity Helpfulness Evaluator detects whether the detected entities are likely helpful to users

7

Page 8: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Entity Linking

‣ Entity Linking: The task of linking entity names to entries in a knowledge base (KB) (e.g., Wikipedia)

‣ Recently the task has received considerable attention✦ Many research papers (2006-) [Cucerzan 2007, Milne et al. 2008, etc.]✦ Open Source Implementations: Wikipedia Miner, DBpedia Spotlight,

TAGME, AIDA, Apache Stanbol, Illinois Wikifier, etc.

8

Kyary Pamyu Pamyu is a Japanese model and singer. Her public image is associated with Japan's kawaisa culture

centered in the Harajuku, Tokyo.

Harajukuwikipedia/Harajuku wikipedia/Kawaii

KawaiiKyary Pamyu Pamyuwikipedia/Kyary_Pamyu_Pamyu

Our implementation of Wikipedia Miner is used

Page 9: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Evaluating Helpfulness of Entities

‣ We use supervised machine-learning to assign binary labels to entities detected by an entity linking system [1].

‣ A binary classification task to detect whether an entity is helpful to readers

9

Kyary Pamyu Pamyu is a Japanese model and singer. Her public image is associated with Japan's kawaisa culture

centered in the Harajuku, Tokyo.

TRUE FALSE/

[1] Yamada et al. Evaluating the Helpfulness of Linked Entities to Readers. In ACM Hypertext 2014.

Page 10: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

List of machine-learning features

Machine-learning Features for Evaluating Helpfulness of Entities

‣ Link probability

‣ Entity

‣ Entity class

‣ Topical coherence

‣ Textual

‣ Mention occurrence

10

Features are classified into six categories

Page 11: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Link Probability Features

‣ It reflects Wikipedia contributors’ judgment of whether or not the mention is helpful to readers

‣ Three derivative features:✦ LINK_PROB: The link probability of the corresponding mention

✦ LINK_PROB_AVG: The average link probability of all mentions that link to the entity

✦ LINK_PROB_MAX: The maximum link probability in all mentions that link to the entity

11

Ca(m): A set of entities that contain mention as an anchorCt(m): A set of entities that contain mention

Link probability is the probability thata mention is used as an anchor text in Wikipedia

Page 12: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Link Probability Features

12

Her public image is associated with Japan's kawaisa

culture centered in the Harajuku, Tokyo

Takeshita Street is a street lined with

fashion boutiques, cafes in Harajuku in

Tokyo, Japan.

Department Store and Museum is a department store

located in the Harajuku...

Takeshita Street Kyary Pamyu Pamyu Laforet

Link Plain text

LINK_PROB(Harajuku) = 2/3

Page 13: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Link Probability Features

‣ It reflects Wikipedia contributors’ judgment of whether or not the mention is helpful to readers

‣ Three derivative features:✦ LINK_PROB: The link probability of the corresponding mention

✦ LINK_PROB_AVG: The average link probability of all mentions that link to the entity

✦ LINK_PROB_MAX: The maximum link probability of all mentions that link to the entity

13

Ca(m): A set of entities that contain mention as an anchorCt(m): A set of entities that contain mention

Link probability is the probability thata mention is used as an anchor text in Wikipedia

Page 14: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Entity Class Features

14

Entity class features are derived from the classes of an entity in DBpedia, Freebase, and Schema.org

Kyary Pamyu Pamyuwikipedia/Kyary_Pamyu_Pamyu

DBpedia:/ontology/PersonDBpedia:/ontology/ArtistFreebase:/people/personFreebase:/music/artist

Harajukuwikipedia/Harajuku

Each class is used as a feature

DBpedia:/ontology/PlaceFreebase:/location/locationFreebase:/location/neighborhood…

Page 15: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Topical Coherence Feature

‣ Measured by how a target entity is related to other detected entities

‣ Relatedness between two entities is calculated based on the Wikipedia in-link-based similarity

‣ The average relatedness between the entity and other entities in the document [Milne 2008] is added as a feature

15

Topical coherence feature represents how an entity isrelated to the topics of the document

e: Entity; KB: Entities in KB; c(e): Entities having a link to e

e: Entity; E: Set of entities in the document

Page 16: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Topical Coherence Feature

16

Kyary Pamyu Pamyu is a Japanese model and singer. Her public image is associated with Japan's kawaisa culture

centered in the Harajuku, Tokyo.

0.2

0.4

REL(Kyary Pamyu Pamyu, kawaisa culture) = 0.4REL(Kyary Pamyu Pamyu, Harajuku) = 0.2

COHERENCE(Kyary Pamyu Pamyu, E) = (0.4 + 0.2) / 2 = 0.3

Page 17: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Dataset for Evaluating Helpfulness of Entities

‣ Based on the IITB EL dataset✦ Contains annotated entities on news articles for entity

linking tasks✦ Popular in entity linking studies

‣ We added an annotation to each entity in the IITB EL dataset that represents whether it is helpful to readers

‣ Amazon Mechanical Turk was used to hire annotators online

‣ A total number of 33,567 annotations were obtained

‣ The dataset is publicly available for further studieshttps://github.com/studio-ousia/el-helpfulness-dataset/

17

The dataset for our task was developedusing crowd-sourcing service

Page 18: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA 18

Annotation screen displayed to the annotators

Page 19: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

The Performance of Evaluating Helpfulness of Entities

‣ Our method performs significantly better than existing methods by 5.7-12% F1

19

Precision Recall F1

LinkProb 0.6980 0.7761 0.7349

TAGME 0.7936 0.7782 0.7857

Milne&Witten 0.8079 0.7904 0.7989

Our Method 0.8697 0.8419 0.8554

Page 20: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Rendering a Widget

‣ The widget content is retrieved from Linked Open Data

‣ The content is dynamically customized using entity classes✦ Map and Images for Hollywood✦ YouTube and Twitter for Lady Gaga (singer)

‣ DBpedia Ontology Classes are used as the entity classes

20

Page 21: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

STUDIO OUSIA

Linkify

21

Linkify is implemented as add-ons of desktop & mobile web browsers

Google Chrome(Desktop)

Mozilla Firefox(Desktop)

Dolphin Browser(Android)

Page 22: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

DEMONSTRATION

Page 23: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

TechCrunch: Linkify’s SDK Wants to Make Mobile Searches Less Cumbersome

The entity linking system worksin production service

Semanticweb.com: Linkify is Working to Make Mobile Searches Easier

Page 24: Linkify: Enhanced Reading Experience by Augmenting Text Using Linked Open Data

THANK YOU!