807 - TEXT ANALYTICS Massimo Poesio Lecture 7: Wikipedia for Text Analytics.

807 - TEXT ANALYTICS

Massimo Poesio

Lecture 7 Wikipedia for Text Analytics

WIKIPEDIA

bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website

The free encyclopedia that anyone can edit

----httpenwikipediaorgwikiWikipeida

WIKIPEDIA

bull Wikipedia is

1 domain independentndash it has a large coverage

2 up-to-datendash to process current information

3 multilingualndash to process information in many languages

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

WIKIPEDIA FOR TEXT ANALYTICS

bull Wikipedia has proven an extremely useful resource for text analytics being used forndash Text classification clusteringndash Enriching documents through lsquoWikificationrsquondash NERndash Relation extraction ndash hellip

Wikipedia as Thesaurus for text classification clusteringbull Unlike other standard ontologies such as WordNet

and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Thesaurus

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence



phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence




redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society




redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

WIKIPEDIA FOR TEXT CATEGORIZATION CLUSTERING

bull Objective use information in Wikipedia to improve performance of text classifiers clustering systems

bull A number of possibilitiesndash Use similarity between documents and Wikipedia

pages on a given topic as a feature for text classification

ndash Use WIKIFICATION to enrich documentsndash Use Wikipedia category system as category repertoire

Using Wikipedia Categories for text classification

17

WIKIPEDIA FOR TEXT CLASSIFICATIONbull Automatic identification of the topiccategory of a text (eg computer science

psychology)ndash Booksndash Learning objects

ldquoThe United States was involved in the Cold Warrdquo

United States03793

Cold War03111

Vietnam War00023

World War I00023

Communism00027

Ronald Reagan00027

Michail Gorbachev00023

Cat Wars Involvingthe United States000779

Cat Global Conflicts000779

USING WIKIPEDIA FOR TEXT CLASSIFICATION

bull Either directly use Wikipedia categories or map onersquos categories to Wikipedia categories

bull Use the documents associated with those categories as training documents

TEXT WIKIFICATION

Wikification = adding links to Wikipedia pages to documents

bull Text

WIKIFICATION

bull Wikipedia

20May 2012 Truc-Vien T Nguyen

Giotto was called to work in Padua and also in Rimini

Wikification pipeline

Candidate

Extraction

Candidate

Ranking

Extract Sense Definitions

from Sense Inventory

Knowledge- based

Lesk- like Definition

Overlap

Data Driven

Naive Bayes

trained on Wikipedia

Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text

Word Sense DisambiguationKeyword Extraction

Keyword Extraction

bull Finding important wordsphrases in raw textbull Two-stage process

ndash Candidate extractionbull Typical methods n-grams noun phrases

ndash Candidate rankingbull Rank the candidates by importancebull Typical methods

ndash Unsupervised information theoretic ndash Supervised machine learning using positional and linguistic

features

Keyword Extraction using Wikipedia

1 Candidate extractionbull Semi-controlled vocabulary

ndash Wikipedia article titles and anchor texts (surface forms)

bull Eg ldquoUSArdquo ldquoUSrdquo = ldquoUnited States of Americardquo

ndash More than 2000000 termsphrasesndash Vocabulary is broad (eg the a are included)


2 Candidate rankingbull tf idf

ndash Wikipedia articles as document collection

bull Chi-squared independence of phrase and textndash The degree to which it appeared more times than

expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP

Our own Approach(Cfr Milne amp Witten 2008 2012 Ratinov et al 2011)

bull Use Wikipedia dump to compute two statistics

bull KEYPHRASENESS prior probability that a term is used to refer to a Wikipedia article

bull COMMONNESS probability that phrase is used to refer to specific Wikipedia article

bull Two versions of system

bull UNSUPERVISED use statistics only

bull SUPERVISED use distant learning to create training data

KEYPHRASENESS

bull the probability that a term t is a link to a Wikipedia article

(cfr Milne amp Wittenrsquos prior link probability)

bull Examplesbull The term Georgia

ndash Is found as a link in 22631 Wikipedia articlesndash appears in 75000 Wikipedia articles keyphraseness = 2263175000 = 03017466

bull Cfr the term ldquotherdquo keyphraseness = 00006

euro

Keyphraseness(t) =count([_ | t])

count(t)

COMMONNESS

bull the probability that a term t is a link to a SPECIFIC Wikipedia article a

bull for example the surface form Georgia was found to be linked to

ndash a1 = University_of_Georgia 166 times

commonness(t a1) = 166(166+18+5) = 08783

ndash Republic_of_Georgia 18 timesndash Georgia_(United_States) 5 times

euro

Commonness(ta) =count([a | t])

count(t)

Extracting dictionaries and statistics from a Wikipedia dump

bull Parsingbull In three phases

bull Identify articles of relevancebull Extract (among other things)

bull Set of SURFACE FORMS (terms that are used to link to Wikipedia articles)

bull Set of LINKS [article|surface_form]

bull [[Pedanius Dioscorides|Dioscorides]]

The Wikipedia Dump from July 2011

ndash 11459639 pages in totalndash 12525583 links

bull specifying surface word target frequency

ndash ranked by frequency bull for example the mention Georgia is linked to

ndash University_of_Georgia 166 times ndash Republic_of_Georgia 18 timesndash Georgia_(United_States) 5 times

May 2012 29Truc-Vien T Nguyen

Some statistics (all Wikidumps from July 2011)

Page Type English Italian Polish

Redirected 4465652 323591 134148

List_of 138581 836 5021

Disambiguation 176721 6193 4553

Relevant 4361020 917354 920486

Total 11459639 1654258 1200313

Surface forms titles articles

Dictionary English Italian Polish

Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981

Files in Polish are arranged in a repository different from EnglishItalian

Some definitions and figuressurface form the occurence of a mention inside an articletarget article the target Wiki article a surface form linked to

The Unsupervised Approach

bull Use Keyphraseness to identify candidate termsbull Retain terms whose keyphraseness is

above a certain threshold (currently 001)bull Use commonness to rank

bull Retain top 10

The Supervised Approach

bull Features in addition to commonness use measures of SIMILARITY between text containing the term and the candidate Wikipedia page

bull RELATEDNESS a measure of similarity between the LINKS (cfr MilneampWittenrsquos NORMALIZED LINK DISTANCE)

euro

Re latedness(a1a2) =log(max( A1 A2 )) minus log( A1 cap A2 ))

log(W ) minus log(min( A1 A2 ))

Training a supervised wikifier

bull Using WIKIPEDIA ITSELF as source of training materials (see next)

Results on standard datasets

APPROACH AQUAINT WIKIPEDIA

Our approach 8566 8437

MilneampWitten 2008 8361 8031

Ratinov et al 2011 8452 9020

bull BAL Data setsndash 1049 Query set

bull 1 annotator up to 3 manual annotationsbull 1 automatic annotation

ndash 100 Query setbull 3 annotators each up to 3 manual annotations

Wikifying queries the Bridgeman datasets

Results on Bridgeman 1000 Y3

CORRECT CANDIDATE IS RESULTS

First candidate 6477

Among first 2 7159

First 3 7542

First 4 7718

First 5 7832

Accuracy up by 17 points (36)

Results for the GALATEAS languages and Arabic

LANGUAGE WIKIPEDIA SIZE RESULTS (on Wikipedia subset)

English 4M articles 8437

Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078

The GALATEAS D2W web services

bull Available as open sourcebull Deployed within LinguaGridbull API based on the Morphosyntactic Annotation

Framework (MAF) an ISO standardbull Tested on 15M queries achieves throughput

of 600 characters per secondbull Integrated with LangLog tool

Use of the service in LangLog

(See Domoinarsquos demo)

Other applications

bull The UK Data Archive

WIKIPEDIA FOR NER

[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip

WIKIPEDIA FOR NER

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

WIKIPEDIA FOR NER

Numberofglucocorticoidreceptorsinlymphocytesandtheirsensitivitytohormoneaction

WIKIPEDIA

WIKIPEDIA FOR NER

bull Wikipedia has been used in NER systemsndash As a source of features for normal NER ndash To automatically create training materials

(DISTANT LEARNING)ndash To go beyond NE tagging towards proper ENTITY

DISAMBIGUATION

Distant learning

bull Automatically extract examples

bull positive examples from mention-to-link Wikipedia page

bull Negative examples from similar mentions with other links

bull Use positive and negative examples to train model

The Supervised Approach Using Wikipedia links to generate training data

bull Examplendash Giotto was called to work in Padua and also in Rimini (sentence taken from Wikipedia text with links avalable)ndash Giotto_di_Bondone (painter) Giotto_Griffiths (Welsh rugby player)

Giotto_Bizzarrini (automobile engineer)

bull Datasetndash +1 Giotto was called to work -- Giotto_di_Bondonendash -1 Giotto was called to work -- Giotto_Griffithsndash -1 Giotto was called to work -- Giotto_Bizzarrini


httpenwikipediaorgwikiGiotto_di_Bondone

MORE ADVANCED USES OF WIKIPEDIA

bull As a source of ONTOLOGICAL KNOWLEDGEbull DBPEDIA

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree


bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607 described by 91 million triples

1048607 using 8141 different properties

1048607 557000 links to pictures

1048607 1300000 links external web pages

1048607 207000 Wikipedia categories

1048607 75000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607 httpdbpediaorgsparql

1048607 hosted on a OpenLink Virtuoso server

1048607 can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS


OPEN MIND COMMONSENSE

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

CONCEPT NET

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user


PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull Mihalcea R and Csomai A Wikify linking documents to encyclopedic knowledge Proceedings of CIKMrsquo07 Lisbon Portugal

bull V Nguyen amp M Poesio 2012 Entity disambiguation and linking over queries using Encyclopedic Knowledge Proceedings of 6th workshop on Analytics for Noisy Unstructured Text Data

bull D Lungley M Trevisan V Nguyen M Althobaiti M Poesio 2013 GALATEAS D2W A Multi-lingual Disambiguation to Wikipedia Web Service Proc Of ENRICH

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

READINGS

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems


WIKIPEDIA

Slide 3

Slide 4


Wikipedia as Thesaurus for text classification clustering


Slide 8

Slide 9

Slide 10

Slide 11


Slide 13




WIKIPEDIA FOR TEXT CLASSIFICATION


TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS

QUALITY OF THE LABELS


Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

WIKIPEDIA

bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website

The free encyclopedia that anyone can edit

----httpenwikipediaorgwikiWikipeida

WIKIPEDIA

bull Wikipedia is




bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages


bullTo the web

bullRedirects

bullDisambiguates








absorbed timely









redirected links





















17




United States03793

Cold War03111

Vietnam War00023

World War I00023

Communism00027

Ronald Reagan00027







TEXT WIKIFICATION


bull Text

WIKIFICATION

bull Wikipedia




Candidate

Extraction

Candidate

Ranking



Knowledge- based


Overlap

Data Driven

Naive Bayes


Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text


Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

WIKIPEDIA

bull Wikipedia is




bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages


bullTo the web

bullRedirects

bullDisambiguates








absorbed timely









redirected links





















17




United States03793

Cold War03111

Vietnam War00023

World War I00023

Communism00027

Ronald Reagan00027







TEXT WIKIFICATION


bull Text

WIKIFICATION

bull Wikipedia




Candidate

Extraction

Candidate

Ranking



Knowledge- based


Overlap

Data Driven

Naive Bayes


Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text


Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages


bullTo the web

bullRedirects

bullDisambiguates








absorbed timely









redirected links





















17




United States03793

Cold War03111

Vietnam War00023

World War I00023

Communism00027

Ronald Reagan00027







TEXT WIKIFICATION


bull Text

WIKIFICATION

bull Wikipedia




Candidate

Extraction

Candidate

Ranking



Knowledge- based


Overlap

Data Driven

Naive Bayes


Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text


Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84








absorbed timely









redirected links





















17




United States03793

Cold War03111

Vietnam War00023

World War I00023

Communism00027

Ronald Reagan00027







TEXT WIKIFICATION


bull Text

WIKIFICATION

bull Wikipedia




Candidate

Extraction

Candidate

Ranking



Knowledge- based


Overlap

Data Driven

Naive Bayes


Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text


Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84








redirected links





















17




United States03793

Cold War03111

Vietnam War00023

World War I00023

Communism00027

Ronald Reagan00027







TEXT WIKIFICATION


bull Text

WIKIFICATION

bull Wikipedia




Candidate

Extraction

Candidate

Ranking



Knowledge- based


Overlap

Data Driven

Naive Bayes


Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text


Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84





















17




United States03793

Cold War03111

Vietnam War00023

World War I00023

Communism00027

Ronald Reagan00027







TEXT WIKIFICATION


bull Text

WIKIFICATION

bull Wikipedia




Candidate

Extraction

Candidate

Ranking



Knowledge- based


Overlap

Data Driven

Naive Bayes


Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text


Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84




TEXT WIKIFICATION


bull Text

WIKIFICATION

bull Wikipedia




Candidate

Extraction

Candidate

Ranking



Knowledge- based


Overlap

Data Driven

Naive Bayes


Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text


Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

bull Text

WIKIFICATION

bull Wikipedia




Candidate

Extraction

Candidate

Ranking



Knowledge- based


Overlap

Data Driven

Naive Bayes


Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text


Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84


Candidate

Extraction

Candidate

Ranking



Knowledge- based


Overlap

Data Driven

Naive Bayes


Voting

Tex

t w

ith

sel

ecte

d k

eyw

ord

s

Dec

om

po

siti

on

Raw

(h

yper

)tex

t

Cle

an T

ext

Rec

om

posi

tion

(Hyp

er)t

ext

wit

h

linked

key

wo

rds

Annotated Text


Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

Keyword Extraction





features










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84










expected by chance

bull Keyphraseness

)(

)()|(

W

key

Dcount

DcountWkeywordP








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84








KEYPHRASENESS






euro


count(t)

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

COMMONNESS




commonness(t a1) = 166(166+18+5) = 08783


euro


count(t)















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84















Redirected 4465652 323591 134148

List_of 138581 836 5021


Relevant 4361020 917354 920486

Total 11459639 1654258 1200313



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84



Titles 4361020 917354 920486

Surface forms 8829624 2484045 2482104

Files 745724 72126 na

Links 10871741 2917235 2937981






bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84




bull Retain top 10




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84




euro









Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84







Ratinov et al 2011 8452 9020








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84








Among first 2 7159

First 3 7542

First 4 7718

First 5 7832





Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84




Italian 1M 7964

French 14M 76-77

German 16M 72-73

Dutch 16M 70-71

Polish 900K 6081

Arabic 200K 8078







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84







Other applications


WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

WIKIPEDIA FOR NER


WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

WIKIPEDIA FOR NER



WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

WIKIPEDIA FOR NER


WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

WIKIPEDIA

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

WIKIPEDIA FOR NER



DISAMBIGUATION

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

Distant learning




















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84
















INFOBOXES






Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84



Web

DBPEDIA

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

1048607 1600000 concepts

1048607 including

1048698 58000 persons

1048698 70000 places


1048698 12000 films







The DBpedia Dataset









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84









mayor_name


governing_body



SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

SPARQL














































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84











































CONCEPT NET






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84






EXAMPLES OF GWAP



ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

ESP




ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

ESP the game






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84






THE TASK

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

SCORING BY MATCHING

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

SOME STATISTICS









THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84







THE TASK

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

RESULTS

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

PHRASE DETECTIVES


bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

bull 2 tasks





NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

NAME THE CULPRIT

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

READINGS





READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

READINGS




WIKIPEDIA

Slide 3

Slide 4




Slide 8

Slide 9

Slide 10

Slide 11


Slide 13






TEXT WIKIFICATION

WIKIFICATION


Keyword Extraction


Slide 24

Slide 25

KEYPHRASENESS

COMMONNESS

Slide 28




Slide 32

Slide 33








Other applications

WIKIPEDIA FOR NER

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47






Slide 53

INFOBOXES

Slide 56

Slide 57

Slide 58

SPARQL

Slide 60

Slide 61

Slide 62

Slide 63


Slide 65

CONCEPT NET



EXAMPLES OF GWAP

ESP

ESP the game

ESP THE GAME

THE TASK

SCORING BY MATCHING

SOME STATISTICS



Slide 78

RESULTS

PHRASE DETECTIVES

Slide 81

NAME THE CULPRIT

READINGS

Slide 84

807 - TEXT ANALYTICS Massimo Poesio Lecture 7: Wikipedia for Text Analytics.

Documents

Transcript of 807 - TEXT ANALYTICS Massimo Poesio Lecture 7: Wikipedia for Text Analytics.