Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER...

52
exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY

Transcript of Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER...

Page 1: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

exhibitlightweight structured data publishing

david huynh + david karger + rob miller

MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY

Page 2: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

2

good ol’ days ... early 1990s

Page 3: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

3

sort

filter

search

Page 4: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

4

Page 5: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

5

early 1990s → 2007

Page 6: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

6

PRESENTATIONHTML

Web Browser

File System

Static Files

Web Server

ImagesJavascript CSS

Page 7: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

7

PRESENTATION

DATA

HTML

Web Browser

Database

File System

Static Files

Web Server

Images

MySQL / Postgres / Oracle

Javascript CSS

Page 8: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

8

PRESENTATION

LOGIC

DATA

HTMLJavascript CSS

XML XSLT

SQL

Web Browser

XmlHttp

Database

File System

Static Files

Application Server Web Server

ASP

ASP.NETCGI

JSP/Java

PHP

Images

MySQL / Postgres / Oracle

Page 9: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

9

publishing data is hard.

Page 10: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

10

can Semantic Web technologies help?

Page 11: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

11

duh!

obviously!

SW technologies are supposed to help!

Page 12: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

12

what people want

what SW lets them do

sortfilter

search

Page 13: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

13

Page 14: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

14

outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min

•implementation: how Exhibit works•real world uses + discussion•related work•future work•conclusion

Page 15: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

15

PRESENTATION

LOGIC

DATA

HTMLJavascript CSS

XML XSLT

SQL

Web Browser

XmlHttp

Database

File Systems

Static Files

Application Server Web Server

ASP

ASP.NETCGI

JSP/Java

PHP

Images

MySQL / Postgres / Oracle

Page 16: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

16

PRESENTATION

LOGIC

DATA

HTMLJavascript CSS

XML XSLT

SQL

Web Browser

XmlHttp

Database

File Systems

Static Files

Application Server Web Server

ASP

ASP.NETCGI

JSP/Java

PHP

Images

MySQL / Postgres / Oracle

Page 17: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

17

PRESENTATION

LOGIC

DATA

HTML

Javascript CSS

Web Browser

XmlHttp

File Systems

Static Files

Web Server

Images

Exhibit API

Page 18: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

18

data

Exhibit API

database

expression languagelocalization

imagescss

viewslens

templatefacets exporters

importers

HTML+

Images+

CSS+JS

web browser

HTML

JS

DOM dataexports

<script src= “...... /exhibit-api.js”></script>

Page 19: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

19

data

Exhibit API

web browser

presentation

sorting filtering maps

timelines

my users

Page 20: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

20

•JSON as default format

•http:// simile . mit . edu / babel /•Bibtex•Excel spreadsheets•Tab separated values•RDF/XML, N3

•Dynamic importers

data formats

JSON files

Page 21: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

21

Page 22: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

22

JSONP data feedgdata.io.handleScriptLoaded({ ... "entry": [ { "id":{ "$t":"http://spreadsheets.google.com/feeds/list/.../od6/public/basic/cokwr" }, "updated":{"$t":"2007-04-16T18:41:56.378Z"}, "category":[ { "scheme":"http://schemas.google.com/spreadsheets/2006", "term":"http://schemas.google.com/spreadsheets/2006#list" } ], "title": { "type":"text", "$t":"Lord of the Rings: The Return of the King" }, "content": { "type": "text", "$t": "{type}: Movie, {genre}: Drama; Epic, {plot:single}: The former Fellowship of the Ring prepare for the final battle for Middle Earth, while Frodo \u0026 Sam approach Mount Doom to destroy the One Ring., {rating:number}: 4" }, ... }, ... ]})

Page 23: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

23

•Javascript is slow, not designed for implementing DBs

•Recommended for < 500 items•Some people have been brave: 2733

items or more

•Not a limitation per se•Exhibit is intended for small data sets

scalability

Page 24: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

24

outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min

✓implementation: how Exhibit works•real world uses + discussion•related work•future work•conclusion

Page 25: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

25

Page 26: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

26

Page 27: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

27

Page 28: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

28

Page 29: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

29

Page 30: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

30

Page 31: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

31

Page 32: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

32

Page 33: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

33

oops!

Page 34: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

34

Page 35: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

35

Page 36: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

36

Page 37: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

37

Page 38: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

38

Page 39: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

39

someone is planning a wedding using Exhibit

Page 40: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

40

presentationscompany members

software toolsrestaurants3 recipes

radio albumsinstalled fonts

hotels near a dance eventdogs for adoption

lego setsdances, costumes, performances

breweries and distillerieskansai dialect field study data

world conflictswedding attendees

Page 41: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

41

presentationscompany members

software toolsrestaurants3 recipes

radio albumsinstalled fonts

hotels near a dance eventdogs for adoption

lego setsdances, costumes, performances

breweries and distillerieskansai dialect field study data

world conflictswedding attendees

If Semantic Web researchers were tobuild a web site with data,

what topic would the data be about?

Page 42: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

42

scientific papers

Page 43: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

43

pub

licati

on

s

The Long Tail

information topics

quantity or

popularitymerchandises

moviesphotos

newsevents

software

lego setsisrael folk dance videos

breweries and distilleries

in Ontario 1914 - 1915

free laborin addition to grad students

✓ ✓dormant data publishers

Page 44: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

44

reuse withoutscraping

have fun!

Page 45: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

45

outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min

✓implementation: how Exhibit works✓real world uses + discussion•related work•future work•conclusion

Page 46: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

46

HTML

Ruby on Rails

Flickr

Google BaseDabbleDBFreeBase

customizedSemantic MediaWiki extension

wiki, blog

Semantic MediaWiki extension

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

custom 3-tier web app

Exhibit

Page 47: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

47

Exhibit

personal

Semantic MediaWiki extension

group world

Freebase

data ownership

personal blogpersonal web space

wiki Wikipedia

unstructured

structured

Related Work

DBPediaYAGO

DabbleDB Google Base

Page 48: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

48

•database in Javascript•TimBL’s Tabulator

•generic browsing interface•for data consumers to do mash-up

•Exhibit•customizable publishing framework

•for data publishers

related work

Page 49: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

49

•feature requests•more views: calendar, histogram, ...•more flexible layouts•visual synchronization, e.g., color

coding•value formats, e.g., $(6,000)•localization

•if there will be a lot of exhibits, let people...•search over them•merge them together

future work

Page 50: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

50

•authoring interface•HTML got us so far...•WYSIWYG editors got us further

•Exhibit will get us so far...•A front-end to Exhibit will get us

further

future work

Page 51: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

51

conclusion•many dormant data publishers in the long tail

•... with few resources to publish data

•Exhibit•answer real world needs of publishing data

•as easy and expressive as HTML•tap the free labor in the long tail

•produce data that doesn’t have to be scraped

•build a Data Web representative of the Web

Page 52: Exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY.

52

google for “exhibit”