Post on 18-Dec-2015
TAP: Context• Islands of XML from disparate web services• Example : Tori Amos
• Up to consumer to put these chunks together• Situation analogous to pre-web hypertext systems and RDBMS today
TAP Goal
• Create a coherent semantic web from disparate chunks
• Effectively make the web a giant distributed DB• Why --- Bringing the Internet to programs
TAP: What We Do
• Inspired by DNS and early web --- simple contracts, everything decentralized
• Protocols to publish & navigate – a small simple set of publishing & access guidelines that knit together
schematically unified whole create
• Bootstrapping: Create comprehensive chunks of the semantic web in a few areas
• Applications: Semantic Search, Internet Wet Lab
TAP Protocol : GetData
• Simple API to navigate this web• DNS : GetHostByName(<host>) => ip addr.
• TAP: GetData(<resource>, <property>) => value – GetData(<Tori Amos>, birthplace) => <Newton, NC>– GetData(<Newton, NC>, temperature) => 57 F– GetData(<Newton, NC>, locatedIn) => <North
Carolina>
• Publisher exposes data as a graph via GetData • Consumer uses GetData to navigate graph• Key tech. issues : Caching, Directories, Names
The Name Problem We don’t get nice sub-graphs like these, with easy to use assembly instructions
Date Of Birth
“8/22/63”
Musician
Crucify
Under The Pink
North Carolina
USALocated in
City
Music Album
instanceof
instanceof
Located in
62 Ftemperature
Author
EMI
Atlantic
publisher
birth
plac
e
publisher
instanceof
instance
of Author
Geo Almanac
Weather channel
CDNow
People Magazine
Newton, NC
Newton, NC
Newton, NC
Tori Amos
Tori Amos
We get a mess like this
Date Of Birth
“8/22/63”
Musician
Crucify
Under The Pink
North CarolinaUSA
Located in
City
Music Album
instanceof
instanceof
Located in
62 Ftemperature
Author
EMI
Atlantic
publisher
birth
plac
e
publisher
instanceof
instance
of Author
Geo Almanac
Weather channel
CDNow
People Magazine
NTNC
Newton,_NorthCar
USNC0491
0,9855,109071,00
328723677
The Name Problem
• Names are crucial in information exchange– 2 parties cannot exchange information about an object
without agreeing on how they are going to refer to it
• The Problem : too many names to keep track off!– No URN for <Newton, NC> or <Tori Amos>– Different sites have different names for the same thing!– URN efforts to date largely failures– Traditional Approach : Name-Mapping tables
Date Of Birth
“8/22/63”
Musician
Crucify
Under The Pink
North CarolinaUSA
Located in
City
Music Album
instanceof
instanceof
Located in62 Ftemperature
Author
EMI
Atlantic
publisher
birth
plac
e
publisher
instanceof
instance
of Author
Geo Almanac
Weather channel
CDNow People Magazine
NTNC
Newton,_NorthCar
USNC0491
0,9855,109071,00
328723677
Calling program
328723677 <-> 0,9855,1…USNC0491 <-> NTNC <-> ...
NTNC
Newton,_Nor…0,9855, …
328723677
USNC0491
TAP Naming
• Reference by descriptions– E.g., “A Musician whose firstName is ’Tori’ and whose
lastName is ‘Amos’ and whose …” – Names are degenerate descriptions
• Amzn:B000002UB2, CDNOW: 328723677 – Description based name negotiation
• Core Insight – Don’t require globally unique names for everything if we
can describe things using a starting vocabulary – Need a description language, starting vocabulary and
negotiation mechanism– Bootstrapping some shared meaning into more shared
meaning
The vision: descriptions choreograph the integration
Date Of Birth
“8/22/63”
Musician
Crucify
Under The Pink
North Carolina
USALocated in
City
Music Album
instanceof
instanceofLocate
d in
62 Ftemperature
Author
EMI
Atlantic
publisher
birth
plac
e
publisher
instanceof
instance
of Author
Geo AlmanacWeather channel
CDNow
People Magazine
NTNC
Newton,_NorthCar
USNC0491
0,9855,109071,00328723677
Calling program
D1
D1
D1, D2
D2
D1 = description of Newton, NCD2 = description of Tori Amos
Description based References• The core protocol : GetData
– GetData(Resource Description, arc-label)– GetData(<Tori Amos>, birthplace) – GetData(RDF Description of Tori Amos, birthplace)
• A form of loose coupling:– Handling Ambiguity, Failure to denote, …
• The core contract:– Expose your data as a Graph– Map incoming descriptions to nodes in your graph
• In return, your data is now integrated into the global semantic web
Infrastructure: Kernel Vocabulary
• Provides vocabulary for descriptions
• Purpose is to provide the infrastructure for constructing descriptions with which programs can refer to things
• “A Musician whose firstName is ’Tori’ and whose lastName is ‘Amos’ and whose
• It doesn’t reside anywhere : it’s a specification
Applications
• Good infrastructures have waves of applications– WWW : home pages, portals, ecommerce, …– DNS : email, telnet, ftp, gopher, … WWW
• Semantic Search– Adding Semantics to Search – Crawl, grab, index model of search doesn’t work for
dynamic web sites or web applications– Semantic based Search Augmentation enables search
to cover time sensitive data
• Internet Wet Lab
How the Semantic Infrastructure gets used in Semantic Search
Search Front End
“Yo Yo Ma”
Musician whose genre is ClassicalMusic,First name is …
Who has - concert dates? - discography? - auctions? - bio?For musician whose
EBay CDNow AllMusic TicketMaster
KB
UDDI++
Concert Dates for Musician whose …
Bio for …
Discography for …
Auctions for …
Caching & Buffering
TAP KBs for Semantic Search
• Large Knowledge Base of specific musicians, cities, athletes, …– Currently covers about 20% of search terms– Built in a largely automated fashion
• Scrapers for free data sources• Simple noun phrase analysis of news articles
– AP, Reuters, …
• Scrapers for important sites to bootstrap
• KB also helps bootstrap the semantic web
KB Coverage Today
• Music – Musicians, instr., styles
• Movies– Movies, actors, tv-shows
• Authors– Top authors, classic books,
• Sports– Athletes, sports, sports
teams, equipment
• Autos– Auto models, motorcycles, .
• Companies– Fortune 500
• Home Appliances– Types, brands
• Toys– Types, brands
• Baby products– Types, brands
• Places– Countries, cities, tourist
attractions, …
• Consumer electronics – Audio/Video, Communication– Game : consoles, titles, …
• Health – Diseases, Drugs, …
Semantic Site Search
• Semantic Search useful not just for internet wide search, but also for site search
• Same principles as internet-wide search• KBs created for searching related individual
sites can be shared between sites• These KBs feed into global semantic web• Example: Semantic Search for www.w3.org
TAP Appl: Internet Wet Lab
• In many sciences, more data will be produced in the next 2 years than exists today
• Increasingly, research consists of writing programs that mine this data
• Data is isolated as islands in different labs• Data from one lab not easily available to
programs in another lab• We want to use TAP to create a single virtual
net-wide “database” containing all this experimental data
• Example : Clinical Trial Data
TAP Organization
• TAP is a multi-organization research effort– IBM, Stanford KSL, Stanford Logic Group, CMU West,
…
• KBs, source-code, etc. freely available (via BSD license)
• A number of new projects starting up … places, entertainment, …
• We invite you to join
• URL: http://tap.stanford.edu/
TAP: Summary
• Small set of guidelines that create a coherent semantic web out of disparate web services
• Potential solution to naming problem– Relevant to all web services
• Semantics Search & Internet Wet Lab as driving applications
• TAP is a research project – Lot of fundamental work remains to be done– Everything freely available. We want you to join!
Date Of Birth
“8/22/63”
Musician
Crucify
Under The Pink
Newton, NC
North Carolina
USA
US State
Located in
Tori Amos
City
Music Album
Country
instanceof
instanceof
instanceof
Located in
62 Ftemperature
Author
EMI
Atlanticpublisher
Weather channel
Bg KB
People Magazine
CDNow
Geo Almanac
birth
pla
ce
publisher
instanceof
instanceof
instance
of
Author
Date Of Birth
“8/22/63”
Musician
Crucify
Under The Pink
North CarolinaUSA
Located in
City
Music Album
instanceof
instanceof
Located in
62 Ftemperature
Author
EMI
Atlantic
publisher
birth
plac
e
publisher
instanceof
instance
of Author
Geo AlmanacWeather channel
CDNow
People Magazine
Newton, NC
Newton, NC
Newton, NC
Tori Amos
Tori Amos
TAP : Summary
• Focus is shifting from just storing and retrieving data to exchanging data. XML provides syntax. We need semantics
• We need infrastructure layer for semantics
• Applications drive infrastructures. The driving application for this layer is Semantics based Search & News Augmentation.
What is an Internet Infrastructure Layer?
• There is a data structure, pieces of which are in different places on the net
• DNS: Hash table of host names to ip addresses accessed via GetHostByName
• WWW : Directed graph of documents accessed via HTTP GET/POST
• Infrastructure layer provides a set of standards & APIs to unify the different pieces so that a client can pretend it is all local
RTA for News Articles
Search/SyndicationFront End
News article
SportsTeam_TexasRangers,AthleteRodriguez_Alex …
Whose - team schedule? - posters? - auctions? - bio?
EBay AOL Shopping AllPosters MLB.com
Knowledge Base
Directory
Team Schedule for team whose title …Poster for …
Videos for …
Auctions for …
Text analysis