exhibit - Massachusetts Institute of Technologypeople.csail.mit.edu/.../talks/ · exhibit...

85
exhibit lightweight structured data publishing david huynh + david karger + rob miller MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY 1

Transcript of exhibit - Massachusetts Institute of Technologypeople.csail.mit.edu/.../talks/ · exhibit...

exhibitlightweight structured data publishing

david huynh + david karger + rob miller

MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY

1

2

sort

facets

querypreview

search

2

3

3

4

PRESENTATIONHTML

Web Browser

File System

Static Files

Web Server

ImagesJavascript CSS

4

5

PRESENTATION

DATA

HTML

Web Browser

Database

File System

Static Files

Web Server

Images

MySQL / Postgres / Oracle

Javascript CSS

5

6

PRESENTATION

LOGIC

DATA

HTMLJavascript CSS

XML XSLT

SQL

Web Browser

XmlHttp

Database

File System

Static Files

Application Server Web Server

ASP

ASP.NETCGI

JSP/Java

PHP

Images

MySQL / Postgres / Oracle

6

7

publishing data is hard.

7

8

sort

filter by country

group by city

8

9

raise your hand...

... if you haveyour own web sites

listing your publications?

9

10

keep your hands up...

... if your web sitessupport sorting and grouping by

by year, author, topic, ...?

10

11

keep your hands up...

... if your sitessupport filtering by

year, author, topic, ...?

11

12

keep your hands up...

... if your sitescan export data in structured formats?

12

13

publishing data is hard.

can Semantic Web technologies help?

13

14

remember back in the early 1990s...

14

outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min• implementation: how Exhibit works• real world uses + discussion• related work• future work• conclusion

15

15

16

PRESENTATION

LOGIC

DATA

HTMLJavascript CSS

XML XSLT

SQL

Web Browser

XmlHttp

Database

File Systems

Static Files

Application Server Web Server

ASP

ASP.NETCGI

JSP/Java

PHP

Images

MySQL / Postgres / Oracle

16

17

PRESENTATION

LOGIC

DATA

HTMLJavascript CSS

XML XSLT

SQL

Web Browser

XmlHttp

Database

File Systems

Static Files

Application Server Web Server

ASP

ASP.NETCGI

JSP/Java

PHP

Images

MySQL / Postgres / Oracle

17

18

PRESENTATION

LOGIC

DATA

HTML

Javascript CSS

Web Browser

XmlHttp

File Systems

Static Files

Web Server

Images

Exhibit API

18

19

HTML+

Images+

CSS+JS

web browser

19

19

HTML+

Images+

CSS+JS

web browser

<script src= “...... /exhibit-api.js”></script>

19

19

Exhibit API

database

expression languagelocalization

imagescss

viewslens

templatefacets exporters

importers

HTML+

Images+

CSS+JS

web browser

19

19

data

Exhibit API

database

expression languagelocalization

imagescss

viewslens

templatefacets exporters

importers

HTML+

Images+

CSS+JS

web browser

19

19

data

Exhibit API

database

expression languagelocalization

imagescss

viewslens

templatefacets exporters

importers

HTML+

Images+

CSS+JS

web browser

HTML

19

19

data

Exhibit API

database

expression languagelocalization

imagescss

viewslens

templatefacets exporters

importers

HTML+

Images+

CSS+JS

web browser

HTML

JS

19

19

data

Exhibit API

database

expression languagelocalization

imagescss

viewslens

templatefacets exporters

importers

HTML+

Images+

CSS+JS

web browser

HTML

JS

DOM

19

19

data

Exhibit API

database

expression languagelocalization

imagescss

viewslens

templatefacets exporters

importers

HTML+

Images+

CSS+JS

web browser

HTML

JS

DOM dataexports

19

20

web browser

20

20

Exhibit API

web browser

20

20

Exhibit API

web browser

presentation

20

20

data

Exhibit API

web browser

presentation

20

20

data

Exhibit API

web browser

presentation

sorting filtering maps

timelines

my users

20

• JSON as default format

• http:// simile . mit . edu / babel /• Bibtex• Excel spreadsheets• Tab separated values• RDF/XML, N3

• Dynamic importers

21

data formats

JSON files

21

22

22

JSONP data feedgdata.io.handleScriptLoaded({ ... "entry": [ { "id":{ "$t":"http://spreadsheets.google.com/feeds/list/.../od6/public/basic/cokwr" }, "updated":{"$t":"2007-04-16T18:41:56.378Z"}, "category":[ { "scheme":"http://schemas.google.com/spreadsheets/2006", "term":"http://schemas.google.com/spreadsheets/2006#list" } ], "title": { "type":"text", "$t":"Lord of the Rings: The Return of the King" }, "content": { "type": "text", "$t": "{type}: Movie, {genre}: Drama; Epic, {plot:single}: The former Fellowship of the Ring prepare for the final battle for Middle Earth, while Frodo \u0026 Sam approach Mount Doom to destroy the One Ring., {rating:number}: 4" }, ... }, ... ]})

2323

• Javascript is slow, not designed for implementing DBs

• Recommended for < 500 items• Some people have been brave: 2733 items or more

• Not a limitation per se• Exhibit is intended for small data sets

24

scalability

24

outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min✓implementation: how Exhibit works• real world uses + discussion• related work• future work• conclusion

25

25

26

26

27

27

28

28

29

29

30

30

31

31

32

32

33

33

34

oops!

34

35

35

36

36

37

37

38

38

39

someone is planning a wedding using Exhibit

39

40

presentationscompany members

software toolsrestaurants3 recipes

radio albumsinstalled fonts

hotels near a dance eventdogs for adoption

lego setsdances, costumes, performances

breweries and distillerieskansai dialect field study data

wedding attendees

40

41

presentationscompany members

software toolsrestaurants3 recipes

radio albumsinstalled fonts

hotels near a dance eventdogs for adoption

lego setsdances, costumes, performances

breweries and distillerieskansai dialect field study data

wedding attendees

If Semantic Web researchers were tobuild a web site with data,

what topic would the data be about?

41

42

scientific papers

42

43

The Long Tail

information topics

quantity or

popularitymerchandises

moviesphotos

newsevents

software

lego setsisrael folk dance videos

breweries and distilleriesin Ontario 1914 - 1915

43

43

The Long Tail

information topics

quantity or

popularitymerchandises

moviesphotos

newsevents

software

lego setsisrael folk dance videos

breweries and distilleriesin Ontario 1914 - 1915

43

43

The Long Tail

information topics

quantity or

popularitymerchandises

moviesphotos

newsevents

software

lego setsisrael folk dance videos

breweries and distilleriesin Ontario 1914 - 1915

43

publ

icat

ions

43

The Long Tail

information topics

quantity or

popularitymerchandises

moviesphotos

newsevents

software

lego setsisrael folk dance videos

breweries and distilleriesin Ontario 1914 - 1915

43

publ

icat

ions

43

The Long Tail

information topics

quantity or

popularitymerchandises

moviesphotos

newsevents

software

lego setsisrael folk dance videos

breweries and distilleriesin Ontario 1914 - 1915

✓ ✓

43

publ

icat

ions

43

The Long Tail

information topics

quantity or

popularitymerchandises

moviesphotos

newsevents

software

lego setsisrael folk dance videos

breweries and distilleriesin Ontario 1914 - 1915

✓ ✓

43

publ

icat

ions

43

The Long Tail

information topics

quantity or

popularitymerchandises

moviesphotos

newsevents

software

lego setsisrael folk dance videos

breweries and distilleriesin Ontario 1914 - 1915

✓ ✓dormant data publishers

43

publ

icat

ions

43

The Long Tail

information topics

quantity or

popularitymerchandises

moviesphotos

newsevents

software

lego setsisrael folk dance videos

breweries and distilleriesin Ontario 1914 - 1915

free laborin addition to grad students

✓ ✓

43

44

44

44

44

44

have fun!

44

44

reuse withoutscraping

have fun!

44

outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min✓implementation: how Exhibit works✓real world uses + discussion• related work• future work• conclusion

45

45

46

Related Work

46

46

flexibility of presentation

Related Work

46

46

flexibility of presentation

flexibility of data

modeling

Related Work

46

46

HTML

flexibility of presentation

flexibility of data

modeling

Related Work

46

46

HTML

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

46

46

HTML

Google BaseDabbleDBFreeBase

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

46

46

HTMLFlickr

Google BaseDabbleDBFreeBase

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

46

46

HTMLFlickr

Google BaseDabbleDBFreeBase

wiki, blog

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

46

46

HTMLFlickr

Google BaseDabbleDBFreeBase

wiki, blog

Semantic MediaWiki extension

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

46

46

HTMLFlickr

Google BaseDabbleDBFreeBase

customizedSemantic MediaWiki extension

wiki, blog

Semantic MediaWiki extension

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

46

46

HTML

Ruby on Rails

Flickr

Google BaseDabbleDBFreeBase

customizedSemantic MediaWiki extension

wiki, blog

Semantic MediaWiki extension

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

46

46

HTML

Ruby on Rails

Flickr

Google BaseDabbleDBFreeBase

customizedSemantic MediaWiki extension

wiki, blog

Semantic MediaWiki extension

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

custom 3-tier web app

46

46

HTML

Ruby on Rails

Flickr

Google BaseDabbleDBFreeBase

customizedSemantic MediaWiki extension

wiki, blog

Semantic MediaWiki extension

circle size = amount of effort

flexibility of presentation

flexibility of data

modeling

Related Work

custom 3-tier web app

Exhibit

46

47

Exhibit

personal

Semantic MediaWiki extension

group world

Freebase

data ownership

personal blogpersonal web space

wiki Wikipedia

unstructured

structured

Related Work

DBPediaYAGO

DabbleDB Google Base

47

• database in Javascript• TimBL’s Tabulator

• generic browsing interface• for data consumers to do mash-up

• Exhibit• customizable publishing framework• for data publishers

48

related work

48

• feature requests• more views: calendar, histogram, ...• more flexible layouts• visual synchronization, e.g., color coding• value formats, e.g., $(6,000)• localization

• if there will be a lot of exhibits, let people...• search over them• merge them together

49

future work

49

• authoring interface• HTML got us so far...• WYSIWYG editors got us further

• Exhibit will get us so far...• A front-end to Exhibit will get us further

50

future work

50

conclusion• many dormant data publishers in the long tail

• ... with few resources to publish data

• Exhibit• answer real world needs of publishing data

• as easy and expressive as HTML• tap the free labor in the long tail

• produce data that doesn’t have to be scraped• build a Data Web representative of the Web

51

51

52

google for “exhibit”

52