807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 10: Knowledge and The Social Web.
-
Upload
rosalind-josephine-hill -
Category
Documents
-
view
220 -
download
1
Transcript of INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 10: Knowledge and The Social Web.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
Massimo Poesio
LECTURE 10 Knowledge and The Social Web
`CYC convinced the AI community that creating a commonsense knowledge
base by hand is impossiblersquo(Massimo Lecture 1)
That may depend on how many people you put on to it
THE SOCIAL WEB
bull Increasingly the Web is becoming not just a way to facilitate information exchange or commercial transactions but also a tool to facilitate socialization (Facebook LinkedIn etc)
bull Also where information can be collectively created
SOCIAL CREATION OF KNOWLEDGE
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
`CYC convinced the AI community that creating a commonsense knowledge
base by hand is impossiblersquo(Massimo Lecture 1)
That may depend on how many people you put on to it
THE SOCIAL WEB
bull Increasingly the Web is becoming not just a way to facilitate information exchange or commercial transactions but also a tool to facilitate socialization (Facebook LinkedIn etc)
bull Also where information can be collectively created
SOCIAL CREATION OF KNOWLEDGE
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
THE SOCIAL WEB
bull Increasingly the Web is becoming not just a way to facilitate information exchange or commercial transactions but also a tool to facilitate socialization (Facebook LinkedIn etc)
bull Also where information can be collectively created
SOCIAL CREATION OF KNOWLEDGE
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
SOCIAL CREATION OF KNOWLEDGE
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
WIKIPEDIA
bullWikipedia is a free multilingual encyclopedia project supported by the non-profit Wikimedia FoundationbullWikipedias articles have been written collaboratively by volunteers around the worldbullAlmost all of its articles can be edited by anyone who can access the Wikipedia website
The free encyclopedia that anyone can edit
----httpenwikipediaorgwikiWikipeida
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
WIKIPEDIA
bull Wikipedia is
1 domain independentndash it has a large coverage
2 up-to-datendash to process current information
3 multilingualndash to process information in many languages
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
bullTitle
bullAbstract
bullInfoboxes
bullGeo-coordinates
bullCategories
bullImages
bullLinks
bullOther languages
bullOther wiki pages
bullTo the web
bullRedirects
bullDisambiguates
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Encyclopedic knowledge in coreference resolution
[The FCC] took [three specific actions] regarding [ATampT] By a 4-0 vote it allowed ATampT to continue offering special discount packages to big customers called Tariff 12 rejecting appeals by ATampT competitors that the discounts were illegal hellip
[The agency] said that because MCIs offer had expired ATampT couldnt continue to offer its discount plan
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Why Wikipedia may help addressing the encyclopedic knowledge problem
httpenwikipediaorgwikiFCC
The Federal Communications Commission (FCC) is an independent United States government agency created directed and empowered by Congressional statute (see 47 USC sect 151 and 47 USC sect 154)
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Another interesting scenario
A fresh mandate for [Mr Ahmadinejad] would say his critics consecrate the ldquorevolution within a revolutionrdquo he has been trying to effect since his surprise electoral triumph in 2005 Best known to outsiders for his bellicose grandstanding [the incumbent] is more familiar to Iranians as a radical and hyperactive populist who has used the tacit backing of his fellow conservative Mr Khamenei greatly to expand the powers of the presidency
Source It could make a big difference The Economist Mar 19th 2009
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Why Wikipedia may help addressing the encyclopedic knowledge problem
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Wikipedia as Ontology
bull Unlike other standard ontologies such as WordNet and Mesh Wikipedia itself is not a structured thesaurus
bull However it is morehellipndash Comprehensive it contains 12 million articles (28
million in the English Wikipedia) ndash Accurate A study by Giles (2005) found Wikipedia can
compete with Encyclopaeligdia Britannica in accuracyndash Up to date Current and emerging concepts are
absorbed timely
Giles J 2005 Internet encyclopaedias go head to head Nature 438 900ndash901
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurus
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Wikipedia Article that describes the Concept Artificial intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected links
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
AI is redirected to its equivalent concept Artificial Intelligence
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system
in which each article belongs to at least one category
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Wikipedia as Ontology
bull Moreover Wikipedia has a well-formed structurendash Each article only describes a single conceptndash The title of the article is a short and well-formed
phrase like a term in a traditional thesaurusndash Equivalent concepts are grouped together by
redirected linksndash It contains a hierarchical categorization system in
which each article belongs to at least one category ndash Polysemous concepts are disambiguated by
Disambiguation Pages
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
bull Taxonomic information category structurebull Attributes infobox text
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Wikipedia category network
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Start with the category tree
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Deriving a taxonomy from Wikipedia (AAAI 2007)
bull Induce a subsumption hierarchy
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
INFOBOXES
bull Collaborative content
bull Semi-structured data
Infobox Writer| bgcolour = silver| name = Edgar Allan Poe| image = Edgar_Allan_Poe_2jpg| caption = This [[daguerreotype]] of Poe was taken in 1848 | birth_date = birth date|1809|1|19|mf=y| birth_place = [[Boston Massachusetts]] [[United States|US]]| death_date = death date and age|1849|10|07|1809|01|19| death_place = [[Baltimore Maryland]] [[United States|US]]| occupation = Poet short story writer editor literary critic| movement = [[Romanticism]] [[Dark romanticism]]| genre = [[Horror fiction]] [[Crime fiction]] [[Detective fiction]]| magnum_opus = The Raven| spouse = [[Virginia Eliza Clemm Poe]]
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
DBpediaorg is a effort to bull extract structured information from Wikipediabull make this information available on the Web under an
open licensebull interlink the DBpedia dataset with other datasets on the
Web
DBPEDIA
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
10486071600000 concepts
1048607including
1048698 58000 persons
1048698 70000 places
1048698 35000 music albums
1048698 12000 films
1048607described by 91 million triples
1048607using 8141 different properties
1048607557000 links to pictures
10486071300000 links external web pages
1048607207000 Wikipedia categories
104860775000 YAGO categories
The DBpedia Dataset
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
The DBpediaorg project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web It uses the SPARQL query language to query this data At Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data
REPRESENTING EXTRACTED INFORMATION
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
httpenwikipediaorgwikiCalgary
httpdbpediaorgresourceCalgary
dbpedianative_name Calgaryrdquo
dbpediaaltitude ldquo1048rdquo
dbpediapopulation_city ldquo988193rdquo
dbpediapopulation_metro ldquo1079310rdquo
mayor_name
dbpediaDave_Bronconnier
governing_body
dbpediaCalgary_City_Council
Extracting Infobox Data (RDF Representation)
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
SPARQL
bull SPARQL is a query language for RDF
bullRDF is a directed labeled graph data format for representing information in the Web bullThis specification defines the syntax and semantics of the SPARQL query language for RDF
bull SPARQL can be used to express queries across diverse data sources whether the data is stored natively as RDF or viewed as RDF via middleware
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
1048607httpdbpediaorgsparql
1048607hosted on a OpenLink Virtuoso server
1048607can answer SPARQL queries like
1048698 Give me all Sitcoms that are set in NYC
1048698 All tennis players from Moscow
1048698 All films by Quentin Tarentino
1048698 All German musicians that were born in Berlin in the 19th century
The DBpedia SPARQL Endpoint
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
bull Efforts such as Wikipedia indicate that many Web surfers may be willing to participate in collective resource-producing effortsndash Other initiatives Citizen Science Cognition and
Language Laboratory hellipbull This has been taken advantage of in AI
ndash Open Mind Commonsense (Singh) (collecting facts)
ndash Semantic Wikis
WEB COLLABORATION FOR KNOWLEDGE ACQUISITION
wwwphrasedetectivescom
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
bull Open Mind Common Sense ndash Singh
bull Crater mapping (results) ndash Kanefsky
bull Learner Learner2 1001 Paraphrases ndash Chklovski
bull FACTory ndash CyCORP
bull Hot or Not ndash 8 Days
bull ESP Phetch Verbosity Peekaboom ndash von Ahn
bull Galaxy Zoo ndash Oxford University
WEB COLLABORATION PROJECTS
wwwphrasedetectivescom
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
OPEN MIND COMMONSENSE
bull A project started in 2000 by Push Singh to take advantage of peoplersquos collaboration to collect commonsense
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
WHATrsquoS IN OPEN MIND COMMONSENSE CAR
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
Twenty Semantic Relation Types in ConceptNet (Liu and Singh 2004)
THINGS (52000 assertions)
IsA (IsA apple fruit) Part of (PartOf CPU computer) PropertyOf (PropertyOf coffee wet) MadeOf (MadeOf bread flour) DefinedAs (DefinedAs meat flesh of animal)
EVENTS (38000 assertions)
PrerequisiteeventOf (PrerequisiteEventOf read letter open envelope) SubeventOf (SubeventOf play sport score goal) FirstSubeventOF (FirstSubeventOf start fire light match) LastSubeventOf (LastSubeventOf attend classical concert applaud)
AGENTS (104000 assertions)
CapableOf (CapableOf dentist pull tooth)
SPATIAL (36000 assertions)
LocationOf (LocationOf army in war)
TEMPORAL time amp sequence
CAUSAL (17000 assertions)
EffectOf (EffectOf view video entertainment) DesirousEffectOf (DesirousEffectOf sweat take shower)
AFFECTIONAL (mood feeling emotions) (34000 assertions)
DesireOf (DesireOf person not be depressed) MotivationOf (MotivationOf play game compete)
FUNCTIONAL (115000 assertions)
IsUsedFor (UsedFor fireplace burn wood) CapableOfReceivingAction (CapableOfReceivingAction drink serve)
ASSOCIATION K-LINES (125 million assertions)
SuperThematicKLine (SuperThematicKLine western civilization civilization) ThematicKLine (ThematicKLine wedding dress veil) ConceptuallyRelatedTo (ConceptuallyRelatedTo bad breath mint)
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
OPEN MIND COMMONSENSE ADDING KNOWLEDGE
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
OMCS ADDING KNOWLEDGE 2
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
FROM OPENMIND COMMONSENSE TO CONCEPT NETbull ConceptNet (Havasi et al 2009) is a semantic
network extracted from OpenMind Commonsense assertions using simple heuristics
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
CONCEPT NET
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
FROM OPENMIND COMMONSENSE FACTS TO
CONCEPTNETA lime is a very sour fruit
isa(limefruit)
property_of(limevery_sour)
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
GAMES WITH A PURPOSE
bull Luis von Ahn pioneered a new approach to resource creation on the Web GAMES WITH A PURPOSE or GWAP in which people as a side effect of playing perform tasks lsquocomputers are unable to performrsquo (sic)
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
bull GWAP do not rely on altruism or financial incentives to entice people to perform certain actions
bull The key property of games is that PEOPLE WANT TO PLAY THEM
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
EXAMPLES OF GWAP
bull Games at wwwgwapcomndash ESPndash Verbosityndash TagATune
bull Other gamesndash Peekaboomndash Phetch
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
ESP
bull The first GWAP developed by von Ahn and their group (2003 2004)
bull The problem obtain accurate description of images to be usedndash To train image search enginesndash To develop machine learning approaches to vision
bull The goal label the majority of the images on the Web
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
ESP the game
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
ESP THE GAMEbull Two partners are picked at random from the
large number of players onlinebull They are not told who their partner is and canrsquot
communicate with thembull They are both shown the same imagebull The goal guess how their partner will describe
the image and type that descriptionndash Hence the ESP game
bull If any of the strings typed by one player matches the string typed by the other player they score points
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
THE TASK
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
SCORING BY MATCHING
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
THE CHALLENGE SCORES
bull One of the motivating factors is to try to score as many points as possible
bull Hourly daily weekly and monthly scores are shown
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
SCORES
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
THE CHALLENGE TIMING
bull Partners try to agree on as many images as they can during 2 frac12 minutes
bull The termometer on the side indicates how many images they have agreed on
bull If they agree on 15 images they score bonus points
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
TABOO WORDS
bull To ensure the production of a large number of specific labels some words are declared TABOO and not allowed
bull Taboo words are obtained from the game itself any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
TABOO WORDS
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
PASSING
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
GOOD LABELS COMPLETING AN IMAGE
bull A label is considered ldquogoodrdquo when more than N players produce it (with N a parameter of the game)
bull An image is ldquodonerdquo when its list of taboo words is so extensive that most players pass on it
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
IMPLEMENTATIONbull Pre-recorded game play
ndash Especially at the beginning and at quiet times there wonrsquot always be players to pair with
ndash In these cases a player is paired against a recorded lsquohandrsquo of a previous game with the same picture
bull Cheatingndash Players could cheat in a number of ways including
agreeing on labels playing against themselvesndash A number of mechanisms are in place against those
casesbull Selecting images
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
SOME STATISTICS
bull In the 4 months between August 9th 2003 and December 10th 2003ndash 13630 playersndash 12 million labels for 293760 imagesndash 80 of players played more than once
bull By 2008 ndash 200000 playersndash 50 million labels
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
ANALYSIS
bull The numbers indicate that the game is fun to play
bull Exciting factorsndash Playing with a partnerndash Playing against time
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
QUALITY OF THE LABELSbull For IMAGE SEARCH
ndash choose 10 labels among those produced and look at which images are returned
bull Compare labels produced by players with labels produced by participants in an experimentndash 15 participants 20 images among the 1000 with more
than 5 labelsndash 83 of game labels also produced by participants
bull Manual assessment of labels (lsquowould you use these labels to describe this imagersquo)ndash 15 participants 20 imagesndash 85 of words rated useful
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
GOOGLE IMAGE LABELLER
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
THE TASK
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
RESULTS
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
VERBOSITY
bull hellip or the game approach to collecting commonsense knowledge
bull Motivation slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700000 facts)
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
THE GAME
bull Based on an existing game TABOOndash Players have to guess a wordndash One of the players gives hints concerning the
word
bull In Verbosity you have two players the DESCRIBER and the GUESSER and a SECRET WORD
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
THE GAME
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
TEMPLATES IN VERBOSITY
bull As in Open Mind Commonsense templates are used to ensure that the relations properties of interest are collected
bull The Describer produces hints by filling in a template
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
GUESSING ATTRIBUTES
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
PRODUCING A DESCRIPTION
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
TEMPLATES
bull _ is a kind of _bull _ is used for _bull _ is typically nearinon _bull _ is the opposite of _ _ is related to _
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
EMULATION
bull As in ESP game pre-recorded games are used when a player cannot be paired with another player
bull The asymmetry of the game causes a problem not encountered in ESP gamendash Describer can just repeat behavior of previous
describerndash Guesser not so easy
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
RESULTS
bull Only published results Irsquom aware of predate the actual release of the game so I donrsquot know about the QUANTITY
bull Qualityndash Ask six raters whether 200 facts collected using
Verbosity are lsquotruersquondash Around 85 success
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
PHRASE DETECTIVES
wwwphrasedetectivesorg
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
bull 2 tasks
ndash Find The Culprit (Annotation)User must identify the closest antecedent of a markable if it is anaphoric
ndash Detectives Conference (Validation)User must agreedisagree with a coreference relation entered by another user
wwwphrasedetectivescom
PHRASE DETECTIVES THE TASKS
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
NAME THE CULPRIT
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-
READINGS
bull V Nastaseamp M Strube Transforming Wikipedia into a large scale multilingual concept network Artificial Intelligence 2012
bull C Havasi J Pustejovsky R Speer and H Lieberman Digital Intuition Applying Common Sense Using Dimensionality Reduction IEEE Intelligent Systems 2009
bull L von Ahn and L Dabbish (2008) Designing games with a purpose Communications of the ACM v 51 n8 58-67
bull Poesio Chamberlain Kruschwitz Robaldo amp Ducceschi 2013 Phrase Detectives Utilizing Collective Intelligence for Internet-Scale Language Resource Creation ACM Transactions on Intelligent Interactive Systems
- INTRODUCTION TO ARTIFICIAL INTELLIGENCE
- `CYC convinced the AI community that creating a commonsense knowledge base by hand is impossiblersquo (Massimo Lecture 1)
- THE SOCIAL WEB
- SOCIAL CREATION OF KNOWLEDGE
- WIKIPEDIA
- Slide 7
- Slide 8
- Encyclopedic knowledge in coreference resolution
- Why Wikipedia may help addressing the encyclopedic knowledge problem
- Another interesting scenario
- Slide 13
- Wikipedia as Ontology
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- The concept Artificial Intelligence belongs to four categories Artificial intelligence Cybernetics Formal sciences amp Technology in society
- Slide 21
- The different meanings that Artificial intelligence may refer to are listed in its disambiguation page
- SEMANTIC NETWORK KNOWLEDGE IN WIKIPEDIA
- Wikipedia category network
- Deriving a taxonomy from Wikipedia (AAAI 2007)
- Slide 26
- INFOBOXES
- Slide 29
- Slide 30
- Slide 31
- SPARQL
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- OPEN MIND COMMONSENSE
- WHATrsquoS IN OPEN MIND COMMONSENSE CAR
- Slide 39
- OPEN MIND COMMONSENSE ADDING KNOWLEDGE
- OMCS ADDING KNOWLEDGE 2
- OPEN MIND COMMONSENSE CHECKING KNOWLEDGE
- Slide 43
- FROM OPENMIND COMMONSENSE TO CONCEPT NET
- Slide 45
- CONCEPT NET
- FROM OPENMIND COMMONSENSE FACTS TO CONCEPTNET
- GAMES WITH A PURPOSE
- GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK
- EXAMPLES OF GWAP
- ESP
- ESP the game
- ESP THE GAME
- THE TASK
- SCORING BY MATCHING
- THE CHALLENGE SCORES
- SCORES
- THE CHALLENGE TIMING
- TABOO WORDS
- Slide 61
- PASSING
- GOOD LABELS COMPLETING AN IMAGE
- IMPLEMENTATION
- SOME STATISTICS
- ANALYSIS
- QUALITY OF THE LABELS
- GOOGLE IMAGE LABELLER
- Slide 69
- RESULTS
- VERBOSITY
- THE GAME
- Slide 73
- TEMPLATES IN VERBOSITY
- GUESSING ATTRIBUTES
- PRODUCING A DESCRIPTION
- TEMPLATES
- EMULATION
- Slide 79
- PHRASE DETECTIVES
- Slide 81
- NAME THE CULPRIT
- READINGS
-