Publishing data on the Semantic Web
-
Upload
peter-mika -
Category
Technology
-
view
2.123 -
download
0
description
Transcript of Publishing data on the Semantic Web
Publishing Data on the Semantic Web
Peter Mika
Researcher, Data Architect
Yahoo! Research
Intro to the Semantic Web
- 3 -
Vague, but exciting… Berners-Lee and the dawn of the Web
- 4 -
Semantic Web
• Publish information in a way that is easier to process for machines
• Web of Data instead of Web of Documents
• Two main architectural challenges
– A common format for sharing data
– Sharing the meaning of data
• Through social means (shared schemas)
• By using powerful schema languages
• Semantic Web standards from W3C
– Languages (RDF, OWL, RIF)
– Serializations (RDF/XML, RDFa)
– Protocols (SPARQL, HTTP)
• Semantic Web research into knowledge representation and reasoning, data integration, data quality and many other topics
• Community efforts to publish data and develop schemas
- 5 -
RDF (Resource Description Framework)
• The basic data model of the Semantic Web
– A universal model to capture all sorts of data: networks, relational, object-oriented…
• Basic unit of information is a triple
– A tuple of (subject, predicate, object)
– Example: (Joe, loves, Mary)
– Each triple gives the value of a property for a given resource or relates two objects to one another
• Object is either a resource or a literal
• An RDF model is a set of triples
– Ordering of statements in an RDF document is irrelevant (unlike XML)
- 6 -
Resources vs. literals
• Resources are identified by a URI or otherwise the are called a blank node
– URIs are a generalization of URLs
– Notation: <http://www.example.org/Person> or ex:Person
• Literals have an optional language and datatype (string, integer etc.)
– Literals can not be subjects of statements
– Datatypes are identified by URIs, e.g. XML Schema datatypes
– Two literals are the same if their components are the same
– Notation: “Joe B.” or Joe@en^^http://…#string
- 7 -
Advanced topic: Resources vs Literals
• Resources are objects, Literals are strings
• Resources are instances of classes, Literals have datatypes
• Whether something is a resource or literal sometimes depends on the detail of modeling
<meta property=“myvocab:knows”>Paris Hilton</meta>
<item rel=“foaf:knows”><meta property=“foaf:name”>Paris Hilton</meta>
</item>
• You cannot make statements about literals (literals are always the object in a triple)
• Resources can carry a globally unique identifier, literals have no identity
• Web resources such as documents and images are resources– <item rel=“rdfs:seeAlso” resource=“http://www.some.related.page.com/”/>
– <item rel=“foaf:img” resource=“http://photosite.example.org/photo.jpg”/>
• When in doubt: it’s a resource
- 8 -
Graphical and textual notation
• A number of ways to serialize an RDF model into an RDF document
– RDF/XML, Turtle, N3, N-Triples
– Example: http://www.cs.vu.nl/~pmika/foaf.rdf
my:Joe
“Joe A.”
name
foaf:Persontype
- 9 -
Informational versus non-informational resources
• Informational resource: an HTML document, image, any other file on the Web
– Retrievable in its entirety from the Web
– Retrieving it can return a 200 OK
• Conceptual (non-informational) resource: a person, an event, a place, etc.
– A description of it may be retrievable from the Web
– When identified by a URL, retrieving it should return a 303 Redirect
• Never confuse a webpage with what it describes!
– You are not your Facebook profile: one is a document, the other is a person. A document has properties such as byte-size, media-type etc, a person has name, age, etc.
– Make sure you don’t use the URL of an existing webpage as the URI of a resource
- 10 -
Vocabularies (ontologies)
• Ontologies are collections of classes and properties used to describe objects in a particular domain
– OWL (the Web Ontology Language) is the standard ontology language
– OWL has an RDF serialization: ontologies are part of the Semantic Web
• Classes can be described by sub- and superclasses, required properties
– Class membership in RDF is expressed using the rdf:type property
– An instance can have multiple classes (types)
– A class can have multiple superclasses
• Properties can be described by their domain, range, cardinalities, etc.
- 11 -
RDF is designed for distributed systems
• URIs provide web-wide global identification across documents– A resource may be described by multiple documents
– We know it’s the same resource because the same URI is used or through reasoning (advanced topic…)
– URIs are intented to be reused
– Unique, but not single identifiers: two URIs may denote the same thing
• URIs are dereferencable (can be retrieved)– A well-behaved URI returns a description of the resource
– Provides authority: the definition of foaf:Person lives at that URI
• Ontologies can be looked up as well– Typically at the root of the URIs, also known as the namespace
– Example: http://xmlns.com/foaf/0.1/Person redirects to the specification
- 12 -
URIs implicitly link data together
(#joe, #name, “Joe A.”)(#joe, #email, mailto:[email protected])
(#mary, name, “Mary B.”)(#mary, gender, “female”)
(#joe, #loves, #mary)
Joe’s homepage
A dating site
Mary’s homepage
(#name, #type, #Property)(#name, #domain, #Person)
Schema doc
- 13 -
Put together, triples form a single ‘global’ graph
“Joe A.”
#joe
#name
#mary
#loves
“Mary B.”
“female”
#name
#gender
Publishing for the Semantic Web
- 15 -
Motivation
• Why publish data on the (Semantic) Web?
– In a business context
• Increase the potential for linking, reuse and aggregation
– Drive traffic back from other sites on the Web
– Pre-competitive data integration (e.g. drug discovery)
• Make your data more easily findable
– Drive traffic from search engines
– In a non-profit context
• Increase industry or government transparency, accountability
• Support research and education by making data accessible
- 16 -
Publishing and consuming data on the Semantic Web
• Publishing data involves– Deciding in which format to publish your data
– Deciding which schema (ontology, vocabulary) to use
• OR you can create a new schema and publish it as well
• Multiple ways of publishing RDF data:1. Linked Data
2. Metadata in HTML
3. SPARQL endpoints
4. Feeds
5. GRDDL
6. Automated tools
Note: you may implement more than one
- 17 -
Option 1: Linked Data
• A web of RDF documents in parallel to the current Web
– Most often implemented as wrappers around databases or APIs
• The four rules of Linked Data:
– Use URIs to identify things.
– Use HTTP URIs so that these things can be referred to and looked up ("dereference") by people and user agents.
– Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF-XML.
– Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
..#PeterM
#Bud
born
“Peter Mika”
label
“Budapest”
label#Hun
capital-of
“2,000,000”
population
..#PeterM
#Bud
born
“Peter Mika”
label
“Budapest”
label#Hun
capital-of
“2,000,000”
population
..#PeterM
#Bud
born
“Peter Mika”
label
“Budapest”
label#Hun
capital-of
“2,000,000”
population
- 18 -
Option 1: Linked Data
• Advantages:
– No change to the publishing of the HTML documents
– Data can be published by third party (e.g. Dbpedia)
• Disadvantages:
– Web servers need to be configured to properly handle URIs that identify concepts instead of documents
– Not favored by search engines
• Lack of use cases
• Crawling needs to be changed
• Authority is difficult to determine
• Tools
– Triple stores (Virtuoso, Oracle etc.) and front-ends (Pubby)
– RDB-to-RDF mappers (e.g. D2RQ, Triplify)
– Validators (Vapour)
– Linked Data browsers (many)
- 19 -
Linked Data as a movement
• Rapidly growing community effort to (re)publish open datasets as Linked Data
– In particular, scientific and government datasets
– see linkeddata.org
- 20 -
Option 2: Metadata in HTML
• Using microformats, RDFa, Microdata (more later)
• Advantages:
– Data and document are always in sync
– Browser plug-in friendly
– Search engine friendly
– Copy-paste friendly
• Tools:
– XML editors (e.g. Oxygen)
– Triplr
– RDFa Distiller
– RDFa bookmarklet
– Ubiquity RDFa plugin
– Optimus microformat parser
• Examples: many, including SlideShare, YouTube, LinkedIn, Digg, Myspace, Facebook…
Peter Mika was born in Budapest.
Peter Mika was born in Budapest.
#PeterM
#Bud
born
“Peter Mika”
label
“Budapest”
label#Hun
capital-of
“2,000,000”
population
Peter Mika was born in Budapest.
Peter Mika was born in Budapest.
#PeterM
#Bud
born
“Peter Mika”
label
“Budapest”
label#Hun
capital-of
“2,000,000”
population
- 21 -
Option 3: SPARQL endpoints
• An API for accessing RDF databases on the Web
– A query language and an HTTP protocol
• Advantages:
– Flexible access: make any query you want
– Also possible to expose a traditional RDBMs via a wrapper
• Disadvantages:
– For the publisher: cost of supporting arbitrary queries
– For the search engine: discovery of SPARQL servers is unsolved
• Tools:
– Triple stores (Oracle, Virtuoso, Sesame, Jena, OWLIM etc.)
– RDB-to-RDF mappers such as D2RQ and Triplify
#PeterM
#Bud
born
“Peter Mika”
label
“Budapest”
label#Hun
capital-of
“2,000,000”
population
- 22 -
Option 4: Feeds
• Disadvantages:
– No standard feed format for RDF: data needs to be formatted and often manually submitted for each search engine
• Advantages
– Submit your data without making it public
• Competing and incompatible formats
– DataRSS (Yahoo!)
– Google Data Protocol
– Open Data Protocol (Microsoft)
..#PeterM
#Bud
born
“Peter Mika”
label
“Budapest”
label#Hun
capital-of
“2,000,000”
population
#PeterM
#Bud
born
“Peter Mika”
label
“Budapest”
label#Hun
capital-of
“2,000,000”
population
#PeterM
#Bud
born
“Peter Mika”
label
“Budapest”
label#Hun
capital-of
“2,000,000”
population
- 23 -
• Publish the rule to transform the HTML to structured data
• GRDDL is a standard for linking an HTML page to a transformation that produces RDF data
• Advantages
– No change to the page
• Disadvantages
• Transformation needs to be executed to get to the data
• Not much support by search engines
• Tools
• Intel MashMaker
• Dapper
• Glue API from AdaptiveBlue
Option 5: Publishing a transformation of the data
xx yy
1 2
<XSLT><XSLT>
- 24 -
Option 6: Automatic markup
• Web services that annotate HTML automatically
• Advantages
– No manual effort
• Disadvantages
– Limited to finding relevant entities in text
• Tools
– OpenCalais
– Zemanta APIPeter Mika was born in Budapest.
Peter Mika was born in Budapest.
<person>Peter Mika</person> was born in <location>Budapest</location>.
<person>Peter Mika</person> was born in <location>Budapest</location>.
- 25 -
Example: Zemanta
• A personal writing assistant for bloggers
– Plugin for popular blogging platforms and web mail clients
• Analyzes text as you type and suggests hyperlinks, tags, categories, images and related articles
• API available with the same functionality
- 26 -
Choosing a vocabulary
• No vocabularies in many domains
– Books, movies, stuff people care about…
• Too many competing proposals in other domains
– Often versions of the same proposal
– Example: vocabularies for microformats
• Not maintained
– I cannot maintain your vocabulary for you
• Limited tool support
– Too many expert tools until now
• Many vocabularies are not designed for annotation
• Missing meeting point and social process
– An ontology is a shared, formal representation of a domain
- 27 -
Choosing a vocabulary
• Search the Web or ask for advice on mailing lists
• Wikis
– semanticweb.org
– vocamp.org
• Beware of people who claim to have the vocabulary of everything
– Preferably you want something small and targeted
• Never a 100% fit you will need to introduce vocabulary terms (classes and properties)
– Do not introduce new classes/properties in existing namespaces
– Example: the namespace http://xmlns.com/foaf/0.1/ is used by the FOAF project. Try not to introduce a new term without contacting the owner, i.e. the membership of the FOAF mailing list.
- 28 -
Advanced topic: creating a vocabulary
1. Get advice on methodology– vocamp.org and semanticweb.org
2. Choose a namespace and a prefix– Give sensible names, e.g. name it after your site, but don’t call it searchmonkey
– Namespace ends either with a slash or a hash
3. Create an RDF or OWL document describing your classes and properties• Use an ontology editor such as Protégé 4.0
• Follow naming conventions
4. Publish your vocabulary– Make sure the URIs of your properties and classes are resolvable
1. E.g. myvocab:digicam should resolve to a document containing the definition of myvocab:digicam
• Convince others to adopt your vocabulary1. If you are in fishing, convince other fishing businesses
- 29 -
How do we build communities? www.vocamp.org
Metadata in HTML
- 31 -
Brief history of the Annotated Web
• 1995: HTML meta tags• 1996: Simple HTML Ontology Extensions (SHOE)• 1998: RDF/XML
– RDF/XML in HTML– RDF linked from HTML
• 2003: Web 2.0– Tagging– Microformats– Metadata in Wikipedia– Machine tags in Flickr
• 2005: eRDF • 2008: RDFa 1.0• 2011: RDFa 1.1• 2012: Microdata?
- 32 -
HTML meta tags
<HTML><HEAD profile="http://dublincore.org/documents/dcq-html/"><META name="DC.author" content="Peter Mika"><LINK rel="DC.rights copyright"
href="http://www.example.org/rights.html" /> <LINK rel="meta" type="application/rdf+xml" title="FOAF"
href= "http://www.cs.vu.nl/~pmika/foaf.rdf"> </HEAD> …</HTML>
- 33 -
SHOE example (Hefflin & Hendler, 1996)
<ONTOLOGY "our-ontology" VERSION="1.0"> <ONTOLOGY-EXTENDS "organization-ontology" VERSION="2.1" PREFIX="org"
URL="http://www.ont.org/orgont.html"> <ONTDEF CATEGORY="Person" ISA="org.Thing"> <ONTDEF RELATION="lastName" ARGS="Person STRING"> <ONTDEF RELATION="firstName" ARGS="Person STRING"> <ONTDEF RELATION="marriedTo" ARGS="Person Person"> <ONTDEF RELATION="employee" ARGS="org.Organization Person">
</ONTOLOGY>
<HEAD><META HTTP-EQUIV="Instance-Key" CONTENT="http://www.cs.umd.edu/~george"> <USE-ONTOLOGY "our-ontology" VERSION="1.0" PREFIX="our" URL="http://ont.org/our-ont.html"> </HEAD><BODY>
<CATEGORY "our.Person">
<RELATION "our.marriedTo" TO="http://www.cs.umd.edu/~helena">
<RELATION "our.employee" FROM="http://www.cs.umd.edu">
My name is
<ATTRIBUTE "our.firstName"> George </ATTRIBUTE>
<ATTRIBUTE "our.lastName"> Cook </ATTRIBUTE> and I live at...
- 34 -
SHOE system
- 35 -
SHOE Text-based query interface
- 36 -
SHOE Graphical Query Interface
- 37 -
Example: Creative Commons
Embedding CC license in HTML (now deprecated):
<HTML><HEAD>… </HEAD><BODY>…
<!–- <rdf:RDF xmlns="http://creativecommons.org/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <Work rdf:about="http://www.yergler.net/averages/"> <dc:title>The Law of Averages</dc:title> <dc:description>...because eventually i'll be right...</dc:description> <license rdf:resource="http://creativecommons.org/licenses/by-nc/1.0/" /> </Work> <License rdf:about="http://creativecommons.org/licenses/by-nc/1.0/"><requires rdf:resource="http://web.resource.org/cc/Notice" /> <permits rdf:resource="http://web.resource.org/cc/Reproduction" /> <permits rdf:resource="http://web.resource.org/cc/Distribution" /> <prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" /> </License> </rdf:RDF>
-->
- 38 -
Example: Creative Commons
• Current: rel attribute (HTML4)
This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/">Creative Commons Attribution 3.0 United States License</a>.
• Use of the “rel” attribute for semantic annotation is the birth of the microformat…
- 39 -
Microformats (μf)
• Agreements on the way to encode certain kinds metadata in HTML
– Reuse of semantic-bearing HTML elements
– Based on existing standards
– Minimality
• Microformats exist for a limited set of objects
– hCard (persons and organizations)
– hCalendar (events)
– hResume
– hProduct
– hRecipe
• Varying degrees of support and stability
– hCard and rel-tag are widely supported
• Community centered around microformats.org
– Specifications and discussions are hosted there
- 40 -
Microformats: limitations
• No shared syntax
– Each microformat has a separate syntax tailored to the vocabulary
• No formal schemas
– Limited reuse, extensibility of schemas
– Unclear which combinations are allowed
• No datatypes
• No namespaces, unique identifiers (URIs)
– no interlinking
– mapping between instances is required
• Always appears in the HTML <body>
- 41 -
Example: the hCard microformat
<cite class="vcard"><a class="fn url" rel="friend colleague met” href="http://meyerweb.com/">Eric Meyer</a> </cite> wrote a post (<cite><a href="http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/">Tax Relief</a></cite>) about an unintentionally humorous letter he received from the <span class="vcard”> <a class="fn org url" href="http://irs.gov/">Internal Revenue Service</a> </span>.
<div class="vcard"> <a class="email fn" href="mailto:[email protected]">Joe Friday</a> <div class="tel">+1-919-555-7878</div> <div class="title">Area Administrator, Assistant</div> </div>
- 42 -
RDFa
• W3C standard for embedding RDF data in HTML documents
– A set of new HTML attributes to be used in head or body
– A specification of how to extract the data from these attributes
• RDFa is just a syntax, you have to choose a vocabulary separately
• RDFa 1.0 is a W3C Recommendation since October, 2008
– RDFa Primer
• RDFa 1.1 is a small update on RDFa to make it easier to use
– Currently Working Draft (March 31, 2011)
– Updated version of the RDFa Primer (April 19, 2011)
• RDFa API for accessing RDFa data in a webpage in the browser from JavaScript
– Currently Working Draft (April 19, 2011)
- 43 -
RDFa 1.1
• Changes
– New vocab attribute to define the default namespace for the document or subtree
– Profile documents to define multiple namespace prefixes
– The prefix attribute as a recommended replacement of xmlns
– You can use URIs even where only CURIEs where allowed before
• RDFa 1.1 is backward compatible with RDFa 1.0
– RDFa 1.1 is recommended if you want to use HTML5
- 44 -
When to use RDFa
• Choose microformats when you find a microformat that fits your needs and supported by your consumers– Microformats are first option because they are simple
– Yahoo supports all major microformats, see the documentation
– It’s a common misconception that RDFa requires XHTML or that it’s compatible with HTML5
• It’s compatible with HTML4, HTML5, XHTML
• If you find none that perfectly fits your needs then you need RDFa– Microformats have a fixed schema: you can not add your own
attributes
• Example: a social networking site with user profiles– VCard is a good candidate, but for example it doesn’t have a way to
express the user’s social connections
– You either live without this, or go with RDFa
- 45 -
RDFa intro: metadata in the header
• More info in the<html prefix="og: http://ogp.me/ns#"> <head> <title>The Trouble with Bob</title> <meta property="og:title" content="The Trouble with Bob" /> <meta property="og:type" content="text" /> <meta property="og:image" content="http://example.com/alice/bob-ugly.jpg" /> ... </head>
- 46 -
RDFa intro: links with a flavor
• More info in theAll content on this site is licensed under <a rel="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License </a>.
- 47 -
RDFa links: talking about subjects other than the page
• More info in theThe trouble with Bob is that he takes much better photos than me: <div about="http://example.com/bob/photos/sunset.jpg"> <img src="http://example.com/bob/photos/sunset.jpg" /> <span property="og:title">Beautiful Sunset</span> by <span property="dc:creator">Bob</span>. </div>
- 48 -
RDFa links: talking about subjects other than the page
• More info in the
<div typeof=”foaf:Person"> <p property=”foaf:name"> Alice Birpemswick </p> <p> Email: <a rel=”foaf:mbox” href="mailto:[email protected]"> [email protected] </a> </p> <p> Phone: <a rel=”foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> </div>
- 49 -
The process of annotating with RDFa
• Find a vocabulary that fits your needs and supported by your consumers
– A vocabulary describes a set of types and attributes within a given domain
– If you don’t find a good candidate, extend an existing one or create a new one
• Annotate your page.
– Before you start, you might want to validate your page for (X)HTML conformance using the W3C’s (X)HTML Validator to reduce the chance of errors. Choose Document Type XHTML + RDFa.
– No specific tool support. If you have an HTML or XML editor that supports DTDs, you will have syntax checking and highlighting.
– Use the RDFa Distiller to validate which data can be extracted from your page.
– If you fancy, use the RDF Validator to graphically visualize the RDF graph that is outputted.
• Put the annotated page online
– The data will be extracted by Google/Bing/Yahoo the next time your page is crawled and indexed
– The data will be available to browser extensions, bookmarklets etc.
• See http://rdfa.info/rdfa-implementations for new tools and APIs
- 50 -
RDFa can be hard to get right…
• Validation problems can stop us from extracting data– Use the W3C validator
– Use the right DOCTYPE declaration if using XHTML
– Set the encoding of your page properly (using HTTP headers or XML declaration)
• Prefixes need to be defined using the xmlns attribute
• Unless you are making statements about the document, set the subject using the about attribute
• Do not include HTML elements in literal values– Incorrect: <div property=“foaf:name”><b>Peter Mika</b></div>
• Use absolute URIs as the value of the resource attribute– Or make sure you specify HTML base
- 51 -
RDFa can be hard to get right… II.
• Be careful when using rel and typeof in combination because of the precedence rules
• BAD example:
<div about=“#id”>
<span property=“foaf:name“>Peter Mika</span>
<span rel=“foaf:img“ typeof=“foaf:Image”>
<span property=“dc:format”>jpg</span>
…
</span
</div>
• To correct, you need to put the typeof inside the <span> node with rel=“foaf:img”
- 52 -
RDFa can be hard to get right… III.
• Typeof does two things at once: it creates a new subject resource and assigns the type to it
• BAD example:
<div about=“#id”>
<span property=“foaf:name“>Peter Mika</span>
<span rel=“foaf:img“ resource=“http://www.example.org/photo.jpg”>
<span typeof=“foaf:Image”>
<span property=“dc:format”>jpg</span>
</span
</span
</div>
• To correct, you have to repeat the resource attiribute on the span node with the typeof
- 53 -
RDFa can be hard to get right… IV.
• Marking up <h1>:
– <h1 property=“dc:title”>My homepage</h1>
– NOT: <h1><div property=“dc:title”>My homepage</h1>
• Marking up an image: <span rel=”foaf:img"> <img alt="Alex" src="http://example.org/alex.jpg"/> </span>
NOT:
<img rel=“foaf:img” src=“photo.jpg/>
• Header
– <meta property=“…” content=“…”>
NOT
– <meta name=“…” content=“…”>
- 54 -
RDFa can be hard to get right… V.
• You can not break up a description like this:
<span rel=“foaf:knows"> <span property=“foaf:name">Peter Mika</span></span>….
<span rel=“foaf:knows"> <a rel=“foaf:email“ href=“mailto:[email protected] /></span>
• This is not the same as:
<span rel=“foaf:knows"> <span property=“foaf:name">Peter Mika</span>
<a rel=“foaf:email“ href=“mailto:[email protected] />
</span>
• In the first case there are two related resources, with one attribute each, in the second case there is a single related resource with two attributes.
- 55 -
Tips
• Hiding information from being displayed
– Links without content will not be rendered
– Use <span property=“foaf:name” content=“Peter Mika”/>
• Use datatypes to provide the expected type of a literal.
– This helps validation because any tool can check whether the literal is indeed of that type.
- 56 -
Example: Facebook’s Like and the Open Graph Protocol
• The ‘Like’ button provides publishers with a way to promote their content on Facebook and build communities
– Shows up in profiles and news feed
– Site owners can later reach users who have liked an object
– Facebook Graph API allows 3rd party developers to access the data
• Open Graph Protocol is an RDFa-based format that allows to describe the object that the user ‘Likes’
- 57 -
Example: Facebook’s Open Graph Protocol
• RDF vocabulary to be used in conjunction with RDFa
– Simplify the work of developers by restricting the freedom in RDFa
• Activities, Businesses, Groups, Organizations, People, Places, Products and Entertainment
• Only HTML <head> accepted
• http://opengraphprotocol.org/
<html xmlns:og="http://opengraphprotocol.org/schema/"> <head>
<title>The Rock (1996)</title> <meta property="og:title" content="The Rock" /> <meta property="og:type" content="movie" /> <meta property="og:url" content="http://www.imdb.com/title/tt0117500/" /> <meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" /> …
</head> ...
- 58 -
Example: Yahoo! Enhanced Results (was: SearchMonkey)
• Guide for publishers to mark-up their pages for common types of objects
– Product, Local, News, Video, Events, Documents, Discussion, Games
• Using popular microformats and RDF vocabularies
– Copy-paste code
– Validator
• Yahoo as a consumer
– See later
- 59 -
Example: Google’s Rich Snippets
• Google accepts popular microformats and its own RDFa vocabulary
– Similar approach to RDFa as Facebook
• Validator to check if the markup is correct
• Google displays enhanced results based on this metadata
– Rich Snippets
- 60 -
Microdata example
<div itemscope itemid=“http://www.yahoo.com/resource/person”> <p>My name is <span itemprop="name">Neil</span>.</p> <p>My band is called <span itemprop="band">Four Parts Water</span>. I was born on <time itemprop="birthday" datetime="2009-05-10">May 10th 2009</time>. <img itemprop="image" src=”me.png" alt=”me”> </p></div
- 61 -
Microdata
• Currently under standardization at the W3C– Originally part of the HTML5 spec, but now a separate document
• Similar to microformats, but with the extensibility of RDFa
– Introduce new terms using reverse domain names or full URIs
• HTML5 also has a number of “semantic” elements such as <time>, <video>, <article>…
- 62 -
RDFa on the rise
Percentage of URLs with embedded metadata in various formats
510% increase between March, 2009 and October, 2010
- 63 -
The state of metadata in HTML
• 5-10% of webpages contain some explicit metadata
– Depending on how you count…
• Too many competing approaches
– Too many formats: microformats vs RDFa vs Microdata
– When using RDFa, publishers may need to use multiple different vocabularies to satisfy everyone