Smxeastbarbarastarr2012
-
Upload
barbara-starr -
Category
Technology
-
view
1.738 -
download
0
description
Transcript of Smxeastbarbarastarr2012
Why Metadata Matters: From a Search Engine Perspective.
Schema 101
By: Barbara StarrTwitter: @BarbaraStarrEmail: [email protected]
• Pursued a doctorate in Artificial Intelligence from South Africa in the 80's.
• Recruited to build intelligent/predictive trading systems on Wall Street
• Migrated to government-based contracts, several of which turned into real world products like
– SIRI (PAL from DARPA)– WATSON (Acquaint - IBM Watson Labs was a team
member)• From the vantage of a semantic technologist, I keenly
watched the evolution of the Semantic Web.• “Shocked into the real world” when working as a
consultant @ Overstock• Today - Educator, Consultant, Developer.
Meta InformationME
By: Barbara StarrTwitter: @BarbaraStarrEmail: [email protected]: http://www.linkedin.com/in/barbarastarr
My favorite author: Isaac Asimov
Favorite book: I Robot
Favorite character: MULTIVAC
Additional MetainformationFor the purpose of this talk:
same-as
MY ROBOT or Artificially Intelligent Entity or Search Engine
SEARCH ENGINE POINT OF VIEW
How can I exploit metadata or
“semantic search”?
SEARCH ENGINE POINT OF VIEW
RICH SNIPPETS 2009
tiles
Searchmonkey 2008I can directly extract
information to enhance SERP displays
SEARCH ENGINE POINT OF VIEW
I can search directly on consumed metadata!
SEARCH ENGINE POINT OF VIEW
I can provide direct answers to queries by
searching on consumed, verified and validated information
SEARCH ENGINE POINT OF VIEWI can even aggregate answers or deduce
them (like a timeline of events)
SEARCH ENGINE POINT OF VIEW
I can even use it in conjunction with machine learning techniques- to eg.
Train other components
I can detect relevancy
signals: i.e what content to show
to what audience
I can use it to Assist in
interpreting a user query
Penn Treebank tagset
?
SEARCH ENGINE POINT OF VIEW
Really interesting in terms of exposing long tail
content too. It makes things findable for me
when pages are published with structured markup!
I meant the beer brewer
in Arizona
SEARCH ENGINE POINT OF VIEW
I’m a Search Engine Robot
I could really use this stuff. And it is like the tower
of babel out there!
MicrodataMicroformatsRDFa
Syntax Ontology:Vocabulary or lexicon
Multiple conflicting vocabularies that I will have to align internally
and multiple syntax formats as well.
Prior to Schema.org
Goodrelations for e-commerce
SEARCH ENGINE POINT OF VIEW
Time to get Serious!
What has been the history?
Percentage of URLs with embedded metadata in various formats
Five-fold increase between March, 2009 and October, 2010
Another five-fold increase between October 2010 and January, 2012
RDFa exploded in 2012 – Source Peter Mika - Yahoo
Current state of metadata on the Web
• 31% of webpages, 5% of domains contain some metadata– Analysis of the Bing Crawl (US crawl, January, 2012)– RDFa is most common format
• By URL: 25% RDFa, 7% microdata, 9% microformat• By eTLD (PLD): 4% RDFa, 0.3% microdata, 5.4% microformat
– Adoption is stronger among large publishers• Especially for RDFa and microdata• See also
– P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012– H.Mühleisen, C.Bizer.Web
Data Commons - Extracting Structured Data from Two Large Web Corpora, LDOW 2012
What’s been the HistoryLinked Open Data exploded from 2007 thru 2010
Oct 2007
Nov 2007
What’s been the History
Sept 2008
March 2009
Linked Open Data exploded from 2007 thru 2010
What’s been the HistoryLinked Open Data exploded from 2007 thru 2010
LOD Cloud
Sept 2010
Timeline of RDFa and Semantic Web Adoption
As of Semtech 2011
Inevitable passage of Semantic Web adoption – culminating in schema.org
SEARCH ENGINE POINT OF VIEW
Align and consume many vocabularies that may not be of interest to search
engines?
Rather mandate vocabulary And Syntax - microdata
A Search Engine alliance has the power
to MANDATE vocabulary and syntax!
Sample portion
SEARCH ENGINE POINT OF VIEW
On the other hand – Not wise to
ignore standards bodies like W3C
No mandate on Syntax
SEARCH ENGINE POINT OF VIEW
Did I tell you I don’t like spam?
SEARCH ENGINE POINT OF VIEW
Make sure you are not cloaking by
feeding one set of information to me
and another to human users!
Ensure your data feeds match
information with the structured
markup or “metadata” on
your web pages.
Your Logo
SEARCH ENGINE POINT OF VIEW
Serving RELEVANT
ANSWERS are IMPERATIVE!
& central to my very being!
SEARCH ENGINE POINT OF VIEW
ELSE I AM
SEARCH ENGINE POINT OF VIEW
X
SEARCH ENGINE POINT OF VIEW
Adding context in search verticals really
helps me serve up relevant information
(Seriously increases my recall), as does
geospatial information.Consumed information - Structured Data Dashboard
Google’s “SearchVerticals”
Notice any correlations?I would advise you to!
OH! and be sure to check out Moores law
SEARCH ENGINE POINT OF VIEW
I also have a pretty good understanding of
big data and web intelligence so I can
leverage them!
SIRI
“Amazing fact: same amount of computing to answer one Google Search query as all the computing done -- in flight and on the ground -- for the entire Apollo program!
SEARCH ENGINE POINT OF VIEW
I can leverage metadata for better image
search
SIRI
I can combine it with computer vision techniques.
I can enhance user’s shopping experience.
SEARCH ENGINE POINT OF VIEW
Know rather than Recognize?
INTRODUCING THE KNOWLEDGE GRAPH
Symbolic reasoning vs
stochastic reasoning (Latter is
more like NLP or page rank)
SEARCH ENGINE POINT OF VIEWTalk of increase in screen real estate
and CTR?
And if you thought the knowledge graph was cool,
checkout the knowledge carousel!
SEARCH ENGINE POINT OF VIEW
Thank you for your time!
And just a bye-the-bye, this technology is still in it’s nascent stages. Can you imagine what I will
be able to do soon?
Barbara StarrEmail: [email protected]: @BarbaraStarr
Resources to help you! Make sure to use them wisely!
Resources at this point in timeCaveat: Some training may be required for some of the tools
Programming Languages:JavaSCript: Microdatajs Live microdataPhp: MicrodataphpRuby: RDF Microdata RDF Lib plugin PerlRuby: RDF Microdata Gem MidaJava: Sindice any23 library
PublishingForm Based tools:
Schema Creator Microdata generator
Standalone toolsWeb.instadata
Editors:Topbraid ComposerProtege
Platforms:DrupalJoomlaWordpress (about 7 of them)VirtuosoTopbraid Composer
Validators, Testers and More Check.rdfa.info Sindice InspectorRich Snippets Testing Tool Bing ValidatorStructured data Linter Online Parser?viewer and RSS generatorValidator.nu Google Structured Data Tester
Resources at this point in timeGoodrelations: Resources, generators, validators, more, ….
Resources
From the mouth of
Franz new toolSoon to be released for SEO
Other Semantic Web Resources
OpenCalais – Can extract information about people, places and thingsAlchemyAPI – named entity extraction, topic recognition, keyword tagging, more ….Cogito – Expert SystemFranz Inc. – Gruff Many More….
Barbara StarrTwitter: @BarbaraStarr
Email: [email protected]: http://www.linkedin.com/in/barbarastarrFor more info contact:
Caveat: Some training may be required for some of the tools