INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 10: Knowledge and The Social Web.

80
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 10: Knowledge and The Social Web

Transcript of INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 10: Knowledge and The Social Web.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Massimo Poesio

LECTURE 10 Knowledge and The Social Web

`CYC convinced the AI community that creating a commonsense knowledge

base by hand is impossiblersquo(Massimo Lecture 1)

That may depend on how many people you put on to it

THE SOCIAL WEB

bull Increasingly the Web is becoming not just a way to facilitate information exchange or commercial transactions but also a tool to facilitate socialization (Facebook LinkedIn etc)

bull Also where information can be collectively created

SOCIAL CREATION OF KNOWLEDGE

WIKIPEDIA

bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website

The free encyclopedia that anyone can edit

----httpenwikipediaorgwikiWikipeida

WIKIPEDIA

bull Wikipedia is

1 domain independentndash it has a large coverage

2 up-to-datendash to process current information

3 multilingualndash to process information in many languages

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Encyclopedic knowledge in coreference resolution

[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip

[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan

Why Wikipedia may help addressing the encyclopedic knowledge problem

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

`CYC convinced the AI community that creating a commonsense knowledge

base by hand is impossiblersquo(Massimo Lecture 1)

That may depend on how many people you put on to it

THE SOCIAL WEB

bull Increasingly the Web is becoming not just a way to facilitate information exchange or commercial transactions but also a tool to facilitate socialization (Facebook LinkedIn etc)

bull Also where information can be collectively created

SOCIAL CREATION OF KNOWLEDGE

WIKIPEDIA

bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website

The free encyclopedia that anyone can edit

----httpenwikipediaorgwikiWikipeida

WIKIPEDIA

bull Wikipedia is

1 domain independentndash it has a large coverage

2 up-to-datendash to process current information

3 multilingualndash to process information in many languages

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Encyclopedic knowledge in coreference resolution

[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip

[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan

Why Wikipedia may help addressing the encyclopedic knowledge problem

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

THE SOCIAL WEB

bull Increasingly the Web is becoming not just a way to facilitate information exchange or commercial transactions but also a tool to facilitate socialization (Facebook LinkedIn etc)

bull Also where information can be collectively created

SOCIAL CREATION OF KNOWLEDGE

WIKIPEDIA

bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website

The free encyclopedia that anyone can edit

----httpenwikipediaorgwikiWikipeida

WIKIPEDIA

bull Wikipedia is

1 domain independentndash it has a large coverage

2 up-to-datendash to process current information

3 multilingualndash to process information in many languages

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Encyclopedic knowledge in coreference resolution

[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip

[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan

Why Wikipedia may help addressing the encyclopedic knowledge problem

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

SOCIAL CREATION OF KNOWLEDGE

WIKIPEDIA

bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website

The free encyclopedia that anyone can edit

----httpenwikipediaorgwikiWikipeida

WIKIPEDIA

bull Wikipedia is

1 domain independentndash it has a large coverage

2 up-to-datendash to process current information

3 multilingualndash to process information in many languages

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Encyclopedic knowledge in coreference resolution

[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip

[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan

Why Wikipedia may help addressing the encyclopedic knowledge problem

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

WIKIPEDIA

bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website

The free encyclopedia that anyone can edit

----httpenwikipediaorgwikiWikipeida

WIKIPEDIA

bull Wikipedia is

1 domain independentndash it has a large coverage

2 up-to-datendash to process current information

3 multilingualndash to process information in many languages

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Encyclopedic knowledge in coreference resolution

[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip

[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan

Why Wikipedia may help addressing the encyclopedic knowledge problem

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

WIKIPEDIA

bull Wikipedia is

1 domain independentndash it has a large coverage

2 up-to-datendash to process current information

3 multilingualndash to process information in many languages

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Encyclopedic knowledge in coreference resolution

[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip

[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan

Why Wikipedia may help addressing the encyclopedic knowledge problem

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

bullTitle

bullAbstract

bullInfoboxes

bullGeo-coordinates

bullCategories

bullImages

bullLinks

bullOther languages

bullOther wiki pages

bullTo the web

bullRedirects

bullDisambiguates

Encyclopedic knowledge in coreference resolution

[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip

[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan

Why Wikipedia may help addressing the encyclopedic knowledge problem

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Encyclopedic knowledge in coreference resolution

[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip

[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan

Why Wikipedia may help addressing the encyclopedic knowledge problem

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Why Wikipedia may help addressing the encyclopedic knowledge problem

httpenwikipediaorgwikiFCC

The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Another interesting scenario

A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency

Source It could make a big difference The Economist Mar 19th 2009

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Why Wikipedia may help addressing the encyclopedic knowledge problem

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Wikipedia as Ontology

bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus

bull However it is morehellipndash Comprehensive it contains 12 million articles (28

million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can

compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are

absorbed timely

Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurus

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Wikipedia Article that describes the Concept Artificial intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected links

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

AI is redirected to its equivalent concept Artificial Intelligence

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system

in which each article belongs to at least one category

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Wikipedia as Ontology

bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed

phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by

redirected linksndash It contains a hierarchical categorization system in

which each article belongs to at least one category ndash Polysemous concepts are disambiguated by

Disambiguation Pages

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

The different meanings that Artificial intelligence may refer to are listed in its disambiguation page

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA

bull Taxonomic information category structurebull Attributes infobox text

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Wikipedia category network

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Start with the category tree

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Deriving a taxonomy from Wikipedia (AAAI 2007)

bull Induce a subsumption hierarchy

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

INFOBOXES

bull Collaborative content

bull Semi-structured data

Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an

open licensebull interlink the DBpedia dataset with other datasets on the

Web

DBPEDIA

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

10486071600000 concepts

1048607including

1048698 58000 persons

1048698 70000 places

1048698 35000 music albums

1048698 12000 films

1048607described by 91 million triples

1048607using 8141 different properties

1048607557000 links to pictures

10486071300000 links external web pages

1048607207000 Wikipedia categories

104860775000 YAGO categories

The DBpedia Dataset

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data

REPRESENTING EXTRACTED INFORMATION

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

httpenwikipediaorgwikiCalgary

httpdbpediaorgresourceCalgary

dbpedianative_name Calgaryrdquo

dbpediaaltitude ldquo1048rdquo

dbpediapopulation_city ldquo988193rdquo

dbpediapopulation_metro ldquo1079310rdquo

mayor_name

dbpediaDave_Bronconnier

governing_body

dbpediaCalgary_City_Council

Extracting Infobox Data (RDF Representation)

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

SPARQL

bull SPARQL is a query language for RDF

bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF

bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

1048607httpdbpediaorgsparql

1048607hosted on a OpenLink Virtuoso server

1048607can answer SPARQL queries like

1048698 Give me all Sitcoms that are set in NYC

1048698 All tennis players from Moscow

1048698 All films by Quentin Tarentino

1048698 All German musicians that were born in Berlin in the 19th century

The DBpedia SPARQL Endpoint

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and

Language Laboratory hellipbull This has been taken advantage of in AI

ndash Open Mind Commonsense (Singh) (collecting facts)

ndash Semantic Wikis

WEB COLLABORATION FOR KNOWLEDGE ACQUISITION

wwwphrasedetectivescom

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

bull Open Mind Common Sense ndash Singh

bull Crater mapping (results) ndash Kanefsky

bull Learner Learner2 1001 Paraphrases ndash Chklovski

bull FACTory ndash CyCORP

bull Hot or Not ndash 8 Days

bull ESP Phetch Verbosity Peekaboom ndash von Ahn

bull Galaxy Zoo ndash Oxford University

WEB COLLABORATION PROJECTS

wwwphrasedetectivescom

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

OPEN MIND COMMONSENSE

bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

WHATrsquoS IN OPEN MIND COMMONSENSE CAR

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)

THINGS (52000 assertions)

IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)

EVENTS (38000 assertions)

PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)

AGENTS (104000 assertions)

CapableOf (CapableOf dentist pull tooth)

SPATIAL (36000 assertions)

LocationOf (LocationOf army in war)

TEMPORAL time amp sequence

CAUSAL (17000 assertions)

EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)

AFFECTIONAL (mood feeling emotions) (34000 assertions)

DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)

FUNCTIONAL (115000 assertions)

IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)

ASSOCIATION K-LINES (125 million assertions)

SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

OPEN MIND COMMONSENSE ADDING KNOWLEDGE

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

OMCS ADDING KNOWLEDGE 2

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

OPEN MIND COMMONSENSE CHECKING KNOWLEDGE

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic

network extracted from OpenMind Commonsense assertions using simple heuristics

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

CONCEPT NET

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

FROM OPENMIND COMMONSENSE FACTS TO

CONCEPTNETA lime is a very sour fruit

isa(limefruit)

property_of(limevery_sour)

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

GAMES WITH A PURPOSE

bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK

bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions

bull The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

EXAMPLES OF GWAP

bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune

bull Other gamesndash Peekaboomndash Phetch

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

ESP

bull The first GWAP developed by von Ahn and their group (2003 2004)

bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision

bull The goal label the majority of the images on the Web

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

ESP the game

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

ESP THE GAMEbull Two partners are picked at random from the

large number of players onlinebull They are not told who their partner is and canrsquot

communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe

the image and type that descriptionndash Hence the ESP game

bull If any of the strings typed by one player matches the string typed by the other player they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

THE TASK

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

SCORING BY MATCHING

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

THE CHALLENGE SCORES

bull One of the motivating factors is to try to score as many points as possible

bull Hourly daily weekly and monthly scores are shown

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

SCORES

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

THE CHALLENGE TIMING

bull Partners try to agree on as many images as they can during 2 frac12 minutes

bull The termometer on the side indicates how many images they have agreed on

bull If they agree on 15 images they score bonus points

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

TABOO WORDS

bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed

bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

TABOO WORDS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

PASSING

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

GOOD LABELS COMPLETING AN IMAGE

bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)

bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

IMPLEMENTATIONbull Pre-recorded game play

ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with

ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture

bull Cheatingndash Players could cheat in a number of ways including

agreeing on labels playing against themselvesndash A number of mechanisms are in place against those

casesbull Selecting images

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

SOME STATISTICS

bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once

bull By 2008 ndash 200000 playersndash 50 million labels

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

ANALYSIS

bull The numbers indicate that the game is fun to play

bull Exciting factorsndash Playing with a partnerndash Playing against time

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

QUALITY OF THE LABELSbull For IMAGE SEARCH

ndash choose 10 labels among those produced and look at which images are returned

bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more

than 5 labelsndash 83 of game labels also produced by participants

bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

THE TASK

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

RESULTS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

VERBOSITY

bull hellip or the game approach to collecting commonsense knowledge

bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

THE GAME

bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the

word

bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

THE GAME

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

TEMPLATES IN VERBOSITY

bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected

bull The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

PRODUCING A DESCRIPTION

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

TEMPLATES

bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

EMULATION

bull As in ESP game pre-recorded games are used when a player cannot be paired with another player

bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous

describerndash Guesser not so easy

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

RESULTS

bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY

bull Qualityndash Ask six raters whether 200 facts collected using

Verbosity are lsquotruersquondash Around 85 success

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

PHRASE DETECTIVES

wwwphrasedetectivesorg

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

bull 2 tasks

ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric

ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user

wwwphrasedetectivescom

PHRASE DETECTIVES THE TASKS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

NAME THE CULPRIT

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS

READINGS

bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012

bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009

bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67

bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems

  • INTRODUCTION TO ARTIFICIAL INTELLIGENCE
  • `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
  • THE SOCIAL WEB
  • SOCIAL CREATION OF KNOWLEDGE
  • WIKIPEDIA
  • Slide 7
  • Slide 8
  • Encyclopedic knowledge in coreference resolution
  • Why Wikipedia may help addressing the encyclopedic knowledge problem
  • Another interesting scenario
  • Slide 13
  • Wikipedia as Ontology
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
  • Slide 21
  • The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
  • SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
  • Wikipedia category network
  • Deriving a taxonomy from Wikipedia (AAAI 2007)
  • Slide 26
  • INFOBOXES
  • Slide 29
  • Slide 30
  • Slide 31
  • SPARQL
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • OPEN MIND COMMONSENSE
  • WHATrsquoS IN OPEN MIND COMMONSENSE CAR
  • Slide 39
  • OPEN MIND COMMONSENSE ADDING KNOWLEDGE
  • OMCS ADDING KNOWLEDGE 2
  • OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
  • Slide 43
  • FROM OPENMIND COMMONSENSE TO CONCEPT NET
  • Slide 45
  • CONCEPT NET
  • FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
  • GAMES WITH A PURPOSE
  • GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
  • EXAMPLES OF GWAP
  • ESP
  • ESP the game
  • ESP THE GAME
  • THE TASK
  • SCORING BY MATCHING
  • THE CHALLENGE SCORES
  • SCORES
  • THE CHALLENGE TIMING
  • TABOO WORDS
  • Slide 61
  • PASSING
  • GOOD LABELS COMPLETING AN IMAGE
  • IMPLEMENTATION
  • SOME STATISTICS
  • ANALYSIS
  • QUALITY OF THE LABELS
  • GOOGLE IMAGE LABELLER
  • Slide 69
  • RESULTS
  • VERBOSITY
  • THE GAME
  • Slide 73
  • TEMPLATES IN VERBOSITY
  • GUESSING ATTRIBUTES
  • PRODUCING A DESCRIPTION
  • TEMPLATES
  • EMULATION
  • Slide 79
  • PHRASE DETECTIVES
  • Slide 81
  • NAME THE CULPRIT
  • READINGS