Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively...
-
Upload
laureen-blair -
Category
Documents
-
view
217 -
download
0
Transcript of Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively...
Center for E-Business TechnologySeoul National University
Seoul, Korea
Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor
Metaweb Technologies, Inc.
San Francisco
International Conference on Management of Data (2008)
2008. 11. 12.
Summarized & presented by Babar Tareen, IDS Lab., Seoul National University
Copyright 2008 by CEBT
Motivation – Wikipedia
Free multilingual encyclopedia
Supports 264 languages
854 Volumes of English articles
2
Copyright 2008 by CEBT
Motivation – English Wikipedia Growth
3
Copyright 2008 by CEBT
Introduction
A public repository of world’s knowledge
Inspired by The Semantic Web and Wikipedia
Supports highly diverse and heterogeneous data
Tries to merge the scalability of structured databases with the diversity of collaborative wikis into a practical, scalable, database of structured general human knowledge
The information contained in Freebase is open to anyone
However, Freebase backend database is not open
4
Copyright 2008 by CEBT
Data Sources
User Contribution
Metaweb Bots
Incorporates facts from many large, publicly available information sources
5
Copyright 2008 by CEBT
Data Model
Freebase is a graph database
Set of nodes and a set of links that establish relationships between the nodes
Key Concepts
Domains
– Bases: collections of topics created by users
– Commons: similar to bases but more general
– Film, Religion, Computers
Types
– Analogues to classes
– Film Actor, Film Festival, Film Distribution, Film Rating, Film Format
Properties
– Specific information elements within a type
– Film Performances, Film Dubbing Performances, IMDb Entry
Topics
– Analogues to objects
– Instances of a type
– Topics can be linked to other domains or other topics
6
Copyright 2008 by CEBT
Data Model (2)
7
Copyright 2008 by CEBT
Key Components
A scalable Tuple Store
An HTTP/JSON-Based API
MQL for read / write operations
A Lightweight, Collaborative Typing System
Loose collection of structuring mechanisms and conventions
A Large, Diverse Data Set
100 million asserts
4000 types
A Philosophy of “Complete Normalization”
Only one GUID for a real world object
8
Copyright 2008 by CEBT
Data Entry
9
Copyright 2008 by CEBT
Schema Creation
10
Copyright 2008 by CEBT
Data Evaluation
11
Copyright 2008 by CEBT
Metaweb Query Language
Metaweb Query Language
Who created the comic character Spider-Man ?
12
QUERY[ { "character_created_by" : null, "name" : "Spider-Man", "type" : "/fictional_universe/fictional_character" }]
{ "code" : "/api/status/ok", "q1" : { "code" : "/api/status/error", "messages" : [ { "code" : "/api/status/error/mql/result", "info" : { "count" : 2, "result" : [ "Steve Ditko", "Stan Lee" ] }, "message" : "Unique query may have at most one result. Got 2", "path" : "character_created_by", "query" : [ { "character_created_by" : null, "error_inside" : "character_created_by", "name" : "Spider-Man", "type" : "/fictional_universe/fictional_character" } ] } ] }, "status" : "200 OK", "transaction_id" : "cache;cache01.p01.sjc1:8101;2008-11-11T05:54:45Z;0021"}
Copyright 2008 by CEBT
MQL Queries
Characters created by Stan Lee
Foreign donations to 2008 US Political Candidates
Nikon Cameras in order of Resolution
Tropical Storms in the 90's
Mountains of the Himalayas
African American authors and their books
Web Browsers that run on the Mac
US cities named Canton
13
Copyright 2008 by CEBT
Applications
Parallax: Freebase Browserhttp://mqlx.com/~david/parallax/index.html
Powerset: Semantic Search Enginehttp://www.powerset.com/
ArchiPortalhttp://dev.mqlx.com/~zak/arch/
Dipity Timelineshttp://www.dipity.com/
14
Copyright 2008 by CEBT
Discussion
Simple architecture
Topics can be associated to multiple types
Analogues to having a database of knowledge
BUT, Now we have two Knowledge bases to maintain
Wikipedia
Freebase
15
Copyright 2008 by CEBT
References
Freebasehttp://www.freebase.com
The Semantic Edge (Web 2.0 Summit 2007)http://www.web2summit.com/cs/web2007/view/e_sess/15043
MQL Query Editorhttp://www.freebase.com/tools/queryeditor/
Freebase Bloghttp://blog.freebase.com/
Freebase Sample Queries http://www.freebase.com/view/freebase/freebase_query
16