Post on 13-May-2015
Data Feed SEO
A4uexpo London, October 2010
Will Critchlow
Data Feeds Are Not Unique
The “Affiliate” Penalty
Unique Content Matrix
Unique Content, Low Trust
Strong site, unique content
Non-unique Content, low trust
Strong site, non-unique content
Un
iqen
ess
Site Strength
Case Study“Welcome visitor, please find out selection of
[insert product] below, we have [number of products] items. We think you’ll like them!”
User Generated Content
“User” Generated Content
User “Generated” Content
Mozenda
Building quick & dirty SEO ToolsA Cheat Sheet & Inspiration
By Will Critchlow, www.distilled.co.uk. First published: www.seomoz.org
APIs (more on programmable web)AdWords – KeywordsAlchemy – Structured data & textBing – Search, news, spellingEvri – Sentiment and popularityFace.com – Face detectionFacebook – Social graphGoogle Analytics – Visitor dataHostip – Geo dataLinkedIn – Professional dataPingdom – Website uptimePostrank (1, 2, 3) – real-time & influenceRapleaf – Social media profilesTwitter – Real time and social... And of course:Linkscape – Links
YQL – Yahoo! Query Language
select * from html where url=“<url>" and xpath=“<xpath>“
select * from html where url=“<url>"
select * from feed where url=“<url>”
select * from search.web where query = “<query>"
xpath (more examples)/foo – the element ‘foo’//bar – all elements ‘bar’foo/bar – all bar elements children of foofoo//bar – bar arbitrary levels below foofoo/*/bar – bar grandchildren of foofoo/* - all children elements of foofoo/@bar – bar attribute on foofoo/[@bar] – foo with bar attributesfoo/[@bar=baz] – where attribute=baz
PythonSince Python is the language of Google App Engine, here is how you can use YQL easily within Python:Download source – extract to yql folder within your application
import yqly = yql.Public()result = y.execute(“<yql query>”)
Crawlers / ScrapersMozenda80legsGoogle App EngineAmazon Web Services
Human TouchAmazon Mechanical TurkSmartsheet (interface to Mechanical Turk)oDesk
Sources Magic Horsepower
Data (more on infochimps)Data.gov – US government dataData.gov.uk – UK government dataDelicious list – from Peter SkomorochGoogle Public Data - DirectoryGuardian – content and dataWorld Bank – finance, health, etc.80legs – prepackaged crawl data
User Generated “Content”
• External search queries
• Internal search queries
• Tags• Testimonials• FAQs/Support
emails
Tracking # of Reviews_gaq.push(['_setCustomVar',
1, // This custom var is set to slot #1. ‘Number of Reviews', // The top level name for the variable ‘1', // The Number of Reviews 3 // Page level variable
]);
Context Is KeyGoogle News: Google likes alternative factsLyrics: Never considered duplicate content
Context is key
Look to stand out from your competitors “Use a source of content that’s not
unique, but that no-one else in your space is using”
Manipulate & Clean Your Data“Kingston
DataTraveler 101 USB flash drive - 4 GB – Cyan”
“Kingston USB memory stick 4gb”
vs
Of Course, Links Always Win
http://www.seobook.com/black-hat-seo-case-study
Manual Reviews – aka “Hand Jobs”
Check out the quality rater guidelines
“Add value to users”
“Relevant”These are
subjective!!
Resources
• http://www.seomoz.org/blog/whiteboard-friday-flat-site-architecture
• http://seogadget.co.uk/solving-site-architecture-issues/• http://www.seomoz.org/blog/api-and-dataset-cheatsheet-building-
quick-dirty-tools• http://www.mozenda.com• http://www.seomoz.org/blog/leveraging-mechanical-turk-odesk-el
ance-craigslist-for-seo• http://www.seochat.com/c/a/Google-Optimization-Help/Googles-Q
uality-Rater-Guidelines-Leaked/• http://www.flickr.com/photos/rosaydani/77371897/
Thanks!
Director> will.critchlow@distilled.co.uk> twitter.com/WillCritchlowDistilled
> www.distilled.co.uk
Will Critchlow