exhibit - Massachusetts Institute of Technologypeople.csail.mit.edu/.../talks/ · exhibit...
Transcript of exhibit - Massachusetts Institute of Technologypeople.csail.mit.edu/.../talks/ · exhibit...
exhibitlightweight structured data publishing
david huynh + david karger + rob miller
MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
1
5
PRESENTATION
DATA
HTML
Web Browser
Database
File System
Static Files
Web Server
Images
MySQL / Postgres / Oracle
Javascript CSS
5
6
PRESENTATION
LOGIC
DATA
HTMLJavascript CSS
XML XSLT
SQL
Web Browser
XmlHttp
Database
File System
Static Files
Application Server Web Server
ASP
ASP.NETCGI
JSP/Java
PHP
Images
MySQL / Postgres / Oracle
6
10
keep your hands up...
... if your web sitessupport sorting and grouping by
by year, author, topic, ...?
10
outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min• implementation: how Exhibit works• real world uses + discussion• related work• future work• conclusion
15
15
16
PRESENTATION
LOGIC
DATA
HTMLJavascript CSS
XML XSLT
SQL
Web Browser
XmlHttp
Database
File Systems
Static Files
Application Server Web Server
ASP
ASP.NETCGI
JSP/Java
PHP
Images
MySQL / Postgres / Oracle
16
17
PRESENTATION
LOGIC
DATA
HTMLJavascript CSS
XML XSLT
SQL
Web Browser
XmlHttp
Database
File Systems
Static Files
Application Server Web Server
ASP
ASP.NETCGI
JSP/Java
PHP
Images
MySQL / Postgres / Oracle
17
18
PRESENTATION
LOGIC
DATA
HTML
Javascript CSS
Web Browser
XmlHttp
File Systems
Static Files
Web Server
Images
Exhibit API
18
19
Exhibit API
database
expression languagelocalization
imagescss
viewslens
templatefacets exporters
importers
HTML+
Images+
CSS+JS
web browser
19
19
data
Exhibit API
database
expression languagelocalization
imagescss
viewslens
templatefacets exporters
importers
HTML+
Images+
CSS+JS
web browser
19
19
data
Exhibit API
database
expression languagelocalization
imagescss
viewslens
templatefacets exporters
importers
HTML+
Images+
CSS+JS
web browser
HTML
19
19
data
Exhibit API
database
expression languagelocalization
imagescss
viewslens
templatefacets exporters
importers
HTML+
Images+
CSS+JS
web browser
HTML
JS
19
19
data
Exhibit API
database
expression languagelocalization
imagescss
viewslens
templatefacets exporters
importers
HTML+
Images+
CSS+JS
web browser
HTML
JS
DOM
19
19
data
Exhibit API
database
expression languagelocalization
imagescss
viewslens
templatefacets exporters
importers
HTML+
Images+
CSS+JS
web browser
HTML
JS
DOM dataexports
19
• JSON as default format
• http:// simile . mit . edu / babel /• Bibtex• Excel spreadsheets• Tab separated values• RDF/XML, N3
• Dynamic importers
21
data formats
JSON files
21
JSONP data feedgdata.io.handleScriptLoaded({ ... "entry": [ { "id":{ "$t":"http://spreadsheets.google.com/feeds/list/.../od6/public/basic/cokwr" }, "updated":{"$t":"2007-04-16T18:41:56.378Z"}, "category":[ { "scheme":"http://schemas.google.com/spreadsheets/2006", "term":"http://schemas.google.com/spreadsheets/2006#list" } ], "title": { "type":"text", "$t":"Lord of the Rings: The Return of the King" }, "content": { "type": "text", "$t": "{type}: Movie, {genre}: Drama; Epic, {plot:single}: The former Fellowship of the Ring prepare for the final battle for Middle Earth, while Frodo \u0026 Sam approach Mount Doom to destroy the One Ring., {rating:number}: 4" }, ... }, ... ]})
2323
• Javascript is slow, not designed for implementing DBs
• Recommended for < 500 items• Some people have been brave: 2733 items or more
• Not a limitation per se• Exhibit is intended for small data sets
24
scalability
24
outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min✓implementation: how Exhibit works• real world uses + discussion• related work• future work• conclusion
25
25
40
presentationscompany members
software toolsrestaurants3 recipes
radio albumsinstalled fonts
hotels near a dance eventdogs for adoption
lego setsdances, costumes, performances
breweries and distillerieskansai dialect field study data
wedding attendees
40
41
presentationscompany members
software toolsrestaurants3 recipes
radio albumsinstalled fonts
hotels near a dance eventdogs for adoption
lego setsdances, costumes, performances
breweries and distillerieskansai dialect field study data
wedding attendees
If Semantic Web researchers were tobuild a web site with data,
what topic would the data be about?
41
43
The Long Tail
information topics
quantity or
popularitymerchandises
moviesphotos
newsevents
software
lego setsisrael folk dance videos
breweries and distilleriesin Ontario 1914 - 1915
43
43
The Long Tail
information topics
quantity or
popularitymerchandises
moviesphotos
newsevents
software
lego setsisrael folk dance videos
breweries and distilleriesin Ontario 1914 - 1915
43
43
The Long Tail
information topics
quantity or
popularitymerchandises
moviesphotos
newsevents
software
lego setsisrael folk dance videos
breweries and distilleriesin Ontario 1914 - 1915
✓
43
publ
icat
ions
43
The Long Tail
information topics
quantity or
popularitymerchandises
moviesphotos
newsevents
software
lego setsisrael folk dance videos
breweries and distilleriesin Ontario 1914 - 1915
✓
43
publ
icat
ions
43
The Long Tail
information topics
quantity or
popularitymerchandises
moviesphotos
newsevents
software
lego setsisrael folk dance videos
breweries and distilleriesin Ontario 1914 - 1915
✓ ✓
43
publ
icat
ions
43
The Long Tail
information topics
quantity or
popularitymerchandises
moviesphotos
newsevents
software
lego setsisrael folk dance videos
breweries and distilleriesin Ontario 1914 - 1915
✓ ✓
43
publ
icat
ions
43
The Long Tail
information topics
quantity or
popularitymerchandises
moviesphotos
newsevents
software
lego setsisrael folk dance videos
breweries and distilleriesin Ontario 1914 - 1915
✓ ✓dormant data publishers
43
publ
icat
ions
43
The Long Tail
information topics
quantity or
popularitymerchandises
moviesphotos
newsevents
software
lego setsisrael folk dance videos
breweries and distilleriesin Ontario 1914 - 1915
free laborin addition to grad students
✓ ✓
43
outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min✓implementation: how Exhibit works✓real world uses + discussion• related work• future work• conclusion
45
45
46
HTML
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
46
46
HTML
Google BaseDabbleDBFreeBase
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
46
46
HTMLFlickr
Google BaseDabbleDBFreeBase
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
46
46
HTMLFlickr
Google BaseDabbleDBFreeBase
wiki, blog
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
46
46
HTMLFlickr
Google BaseDabbleDBFreeBase
wiki, blog
Semantic MediaWiki extension
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
46
46
HTMLFlickr
Google BaseDabbleDBFreeBase
customizedSemantic MediaWiki extension
wiki, blog
Semantic MediaWiki extension
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
46
46
HTML
Ruby on Rails
Flickr
Google BaseDabbleDBFreeBase
customizedSemantic MediaWiki extension
wiki, blog
Semantic MediaWiki extension
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
46
46
HTML
Ruby on Rails
Flickr
Google BaseDabbleDBFreeBase
customizedSemantic MediaWiki extension
wiki, blog
Semantic MediaWiki extension
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
custom 3-tier web app
46
46
HTML
Ruby on Rails
Flickr
Google BaseDabbleDBFreeBase
customizedSemantic MediaWiki extension
wiki, blog
Semantic MediaWiki extension
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
custom 3-tier web app
Exhibit
46
47
Exhibit
personal
Semantic MediaWiki extension
group world
Freebase
data ownership
personal blogpersonal web space
wiki Wikipedia
unstructured
structured
Related Work
DBPediaYAGO
DabbleDB Google Base
47
• database in Javascript• TimBL’s Tabulator
• generic browsing interface• for data consumers to do mash-up
• Exhibit• customizable publishing framework• for data publishers
48
related work
48
• feature requests• more views: calendar, histogram, ...• more flexible layouts• visual synchronization, e.g., color coding• value formats, e.g., $(6,000)• localization
• if there will be a lot of exhibits, let people...• search over them• merge them together
49
future work
49
• authoring interface• HTML got us so far...• WYSIWYG editors got us further
• Exhibit will get us so far...• A front-end to Exhibit will get us further
50
future work
50
conclusion• many dormant data publishers in the long tail
• ... with few resources to publish data
• Exhibit• answer real world needs of publishing data
• as easy and expressive as HTML• tap the free labor in the long tail
• produce data that doesn’t have to be scraped• build a Data Web representative of the Web
51
51