SEO for the Semantic Web
-
Upload
mihai-gheza -
Category
Technology
-
view
6.087 -
download
0
description
Transcript of SEO for the Semantic Web
How do the machines know what Tasty Wheat what Tasty Wheat
tasted like?Mouse – The Matrix
Short SEO HistoryShort SEO History• Web1 0• Web1.0
• Web2.0Web2.0
• Web3.0
GenesisGenesis
• A story of the Internet byA story of the Internet, by
• Solving the most important problems
l i fl d b• Greatly influenced by one man…
Tim Berners‐LeeTim Berners Lee
“the World Wide Web is Berners-Lee's alone. He designed it. He loosed it on the gworld. And he more than anyone else has fought to keep it open, nonproprietary and free.”
Time Magazine 1999Time Magazine, 1999
The ProblemThe Problem
• Where can I find the information?Where can I find the information?
“Our ineptitude in getting at the record is largely caused by the artificiality of the systems of indexing ”systems of indexing.
The Atlantic Monthly, 1945
Archie, 1990Archie, 1990
• Indexed file names andIndexed file names and
• Returned results based on pattern matching
Web1 0Web1.0
Web1.0Web1.0
• Means HTMLMeans HTML
• Is born in 1991, with the help of
i ( ) h l f d d• Tim Berners‐Lee (TBL), who also founded
• WWW Consortium (W3C) at MIT, and also
• Created WWW Virtual Library – the 1st catalog
Yahoo Directory, 1994Yahoo Directory, 1994
• Vertical = categories is likeVertical = categories... is like
• “Show me all the stuff and I’ll handle it”
ll i d d ff hi h• Manually indexed stuff, which was
• OK for starters, but…
• Websites quickly grew in number and
• Y! started charging money for one listingY! started charging money for one listing
• Increasingly more money...
,1994,1994
• First SE to fully search textFirst SE to fully search text
• Bought by AOL, then
S ld i hi h• Sold to Excite, which
• Excite went bankrupt and
• WebCrawler ends up bought by InfoSpace
Other “Search Engines”Other Search Engines
• 1994 reaches 60mil pages in ‘961994, reaches 60mil pages in 96
• 1995, bought by Overture, bought by Y!
996 h b h b• 1996, meta search, bought by Lycos
• 1997, bought by IAC/InterActiveCorp
• 1999, bought by Overture, meaning Y!
Shopping fun, right?Shopping fun, right?
, 1998, 1998
• Open Directory ProjectOpen Directory Project
• Each listing is checked and certified by a volunteervolunteer
• The main source for Google Directory
Current State of Search IndustryCurrent State of Search Industry
Web1.0 ProblemsWeb1.0 Problems
• SE couldn’t understand text soSE couldn t understand text, so
• They said “why don’t you implement some meta tags (description & keywords) so we canmeta tags (description & keywords) so we can get a glimpse of what you’re saying”
Th l f i h• The relevancy of a page with respect to a keyword was determined by a few factors, so
• It was very easy to abuse and spam, therefore
• Search Results had poor qualityp q y
Web2 0Web2.0
Web2.0Web2.0
• Is coined by Tim O’Reilly yetIs coined by... Tim O Reilly, yet
• TBL later said that “web2.0” is a stupid, meaningless term and that he thought of itmeaningless term and that he thought of it first in ’96 anyway
Web2.0 meansWeb2.0 means
• which grew apart because ofwhich grew apart because of
• PageRank (1998) invented by
& S i h d d h l f• Larry & Sergei who adapted the algo from
• An MIT professor who had developed
• A nasty mathematical formula for positioning keywords in a 3d space model based on the y prelevancy that one kw holds … whatever
PageRank actually meansPageRank actually means
• That a link is a vote andThat a link is a vote and
• Not all links are created equal, so
h li k• It matters who links to you
• Just like in our real life society
• Read the content of pages really well just thatRead the content of pages really well, just that
• Pages were crappy:N t d d di– Non‐standard coding
– Ugly tech (like applets)
– Senseless IA
• So Google said: “don’t do evil and try to nicely format the info, according to W3C standards”(remember TBL)
Enter the SEOEnter the SEO
SEOSEO
• Is a multitude of practices aimed at facilitatingIs a multitude of practices aimed at facilitating the indexing of pages by search engines
• Evolves as the ranking algorithm changes and• Evolves as the ranking algorithm changes, and
• Of course, the algorithm is kept secret.
SEO actually meansSEO actually means
Courtesy of Kelly Ishikawa
SEO actually meansSEO actually means
• An on‐going battle between bots & SEO guysAn on going battle between bots & SEO guys
• Now 100+ factors influence ranking
d ’d lik k h i lk b h• And I’d like to take the time to talk about each one of them in the following…
Just kiddingJust kidding
My SEO Cheat SheetMy SEO Cheat Sheet
• Consider:Consider:1. Page Titles2. URLs (mod_rewrite)3. Anchor Text4. Website Architecture (IA)5. Link Title & Alt Images6. Relevant content (text)7 Sitemap xml7. Sitemap.xml8. Hosting9. Freshness9. Freshness
ResourcesResources
Matt Cutts Blog
Mihai’s SEO Cheat Sheet :D
Web2.0 ProblemsWeb2.0 Problems
• © for pictures articles books etc© for pictures, articles, books, etc
• PPC fraud
i• Privacy
• Search Engine SPAM
• Link bombing
• Paid linksPaid links
• But more important...
Web2.0 ProblemsWeb2.0 Problems
• SE still don’t understand what the $#%@SE still don t understand what the $#%@ you’re talking about
• Crawling a website’s interface to extract info is• Crawling a website s interface to extract info is almost insane
Web3 0Web3.0
Web3.0Web3.0
• Means semantic webMeans semantic web
• Attention migrates from syntax/formatting to semantics andsemantics and
• Meta Data (data about the data) becomes...
Web3.0Web3.0
&
Resource Description MicroformatsResource DescriptionFramework
Microformats
Resource Description FrameworkResource Description Framework
• A kind of XMLA kind of XML
• RDF = Subject + Predicate + Object
S O i l hi h• S + P + O creates a Triple which
• Can describe almost anything in the universe
• Triples are connectable (eg: FOAF)
• RDFa = XHTML + RDF (W3C compliant)RDFa XHTML + RDF (W3C compliant)
MicroformatsMicroformats
• hCalendar • hCard• rel‐tag• VoteLinks• XFN• Geo• hResumehR i• hReview
• etc
Case StudyCase Study
SPARQLSPARQL
• SPARQL Protocol and RDF Query LanguageSPARQL Protocol and RDF Query Language
• Standardized on 15th Jan 08 (1 month ago) and
d d b ?• Endorsed by?... TBL
"Trying to use the Semantic Web withoutSPARQL is like trying to use a relational Q y g
database without SQL“
TBLTBL
PotentialPotential
• With SPARQL you skip the presentation layerWith SPARQL you skip the presentation layer
• You can query ad‐hoc any API, so
d ’ d l i d h f• You don’t need to crawl in advance, therefore
• Information will be as fresh as it gets
And possibilitiesAnd possibilities
• Query: “I can has pizza?”Query: I can has pizza?
• Returns: A f i d f (XFN F b k)– A friend of yours (XFN ‐ Facebook)
– has a colleague (FOAF ‐ LinkedIN) who
( )– said that they make good pizza (hReview ‐ yelp) at
– a restaurant nearby (geo – Gmaps)
– Tip: U2 in concert today (hCalendar ‐ upcoming)
Perhaps now we can seePerhaps now we can see
• Why Social Networking Communities areWhy Social Networking Communities are worth so much, even though most of them don’t have a revenue model– Facebook– LinkedIN– Meebo– Beebo – Pipu...
• They/We are the databases of the future
Thanks!Thanks!
“Most of the right choices in SEO come from asking: What’s the best thing for the user?”g g
Matt Cutts
Mih i GhMihai Gheza
Creative Commons Attribution‐Noncommercial‐Share Alike 3.0 Unported License.