Post on 29-Dec-2015
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
Massimo Poesio
LECTURE 10 Knowledge and The Social Web
`CYC convinced the AI community that creating a commonsense knowledge
base by hand is impossiblersquo(Massimo Lecture 1)
That may depend on how many people you put on to it
THE SOCIAL WEB
bull Increasingly the Web is becoming not just a way to facilitate information exchange or commercial transactions but also a tool to facilitate socialization (Facebook LinkedIn etc)
bull Also where information can be collectively created
SOCIAL CREATION OF KNOWLEDGE
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
`CYC convinced the AI community that creating a commonsense knowledge
base by hand is impossiblersquo(Massimo Lecture 1)
That may depend on how many people you put on to it
THE SOCIAL WEB
bull Increasingly the Web is becoming not just a way to facilitate information exchange or commercial transactions but also a tool to facilitate socialization (Facebook LinkedIn etc)
bull Also where information can be collectively created
SOCIAL CREATION OF KNOWLEDGE
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
THE SOCIAL WEB
bull Increasingly the Web is becoming not just a way to facilitate information exchange or commercial transactions but also a tool to facilitate socialization (Facebook LinkedIn etc)
bull Also where information can be collectively created
SOCIAL CREATION OF KNOWLEDGE
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
SOCIAL CREATION OF KNOWLEDGE
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems