© 2009 Stephan M Spencer Netconcepts Duplicate Content & The Canonical Tag By Stephan Spencer,...

download © 2009 Stephan M Spencer Netconcepts  Duplicate Content & The Canonical Tag By Stephan Spencer, President &

If you can't read please download the document

description

© 2009 Stephan M Spencer Netconcepts Duplicate Content Mitigation  Is not just about removing competing duplicate pages  It’s about recovering the leaked PageRank too

Transcript of © 2009 Stephan M Spencer Netconcepts Duplicate Content & The Canonical Tag By Stephan Spencer,...

2009 Stephan M Spencer NetconceptsDuplicate Content & The Canonical Tag By Stephan Spencer, President & CEO, Netconcepts 2009 Stephan M Spencer NetconceptsThe Canonical Tag Influences your sitelinks in Google 2009 Stephan M Spencer NetconceptsDuplicate Content Mitigation Is not just about removing competing duplicate pages Its about recovering the leaked PageRank too 2009 Stephan M Spencer NetconceptsPageRank Leakage Noindexed or disallowed pages (via robots.txt directives or robots meta tags) still accumulate PageRank If the page is allowed (via robots.txt) but meta robots noindexed, it also passes PageRank Thankfully, when obeyed, the canonical tag aggregates PageRank 2009 Stephan M Spencer NetconceptsTools for Collapsing Duplicates The Canonical Tag Great new addition to the SEO's arsenal, but not your best weapon Works best when used in concert with other signals 301 Redirect A much more absolute/automatically obeyed signal Use instead of (or in addition to) the canonical tag 2009 Stephan M Spencer NetconceptsTools for Collapsing Duplicates XML Sitemaps Include only your canonical versions in your feed Used as a canonicalization signal by Google Rel=Nofollow On links pointing to the noncanonical versions Nofollowed links arent even used for discovery by Google Meta Robots Nofollow blocks the flow of PageRank 2009 Stephan M Spencer NetconceptsPageRank Leakage Scenarios Robots.txt disallow the dup page = PageRank is leaked to the duplicate, & it can show up in the SERPs Meta robots noindex (or Robots.txt noindex) the dup = PageRank is leaked, wont show up in the SERPs Rel=nofollow on links to the dup = PageRank can still accumulate through other links & it can still be indexed Meta robots nofollow the dup = PageRank that accumulates on the dup cannot be passed on 2009 Stephan M Spencer NetconceptsPageRank Leakage Scenarios XML Sitemaps file only includes the canonical version = only used as a hint, dups may still be indexed Canonical tag pointing to canonical version on all dups = only used as a hint, dups may still be indexed 301 all dups to the canonical version = removes dups, may have unintended side effects (e.g. breaking your sites sorting capability) Conditional 301 = removes dups, high risk 2009 Stephan M Spencer NetconceptsCanonical Tag Has Serious Limitations It doesn't work cross-domain Only within the domain. Cross-subdomain is supported though This is by design, to thwart the element's use by spammers Thus you can't use it to reduce dup content to typo domains that you own It's only a hint, not an absolute directive Google sometimes chooses not to follow it even though it clearly should So it's not nearly as strong of a signal as a 301 2009 Stephan M Spencer NetconceptsCanonical Tag Misfires NorthernSafety.com Wikipedia 2009 Stephan M Spencer NetconceptsAn Example in the Wild Many thousands of non-canonical URLs of northernsafety.com are indexed, despite use of the canonical tag For example, click on the listings onm/products/+inurl:protective-clothing and compare those URLs to what's listed as the canonical URL in the link tag in the HTML source of these pages Canonical tags have been in place for several months 2009 Stephan M Spencer Netconcepts What To Do? So if the Canonical Tag cant (yet) be trusted to work, what to do in addition / instead? Some scenarios to consider... Pagination Faceted navigation Affiliate or Click-tracked URLs Near-duplicates Country-specific versions on the same domain Manufacturer-supplied product copy 2009 Stephan M Spencer NetconceptsPagination Excessive pagination dilutes crawl equity, causing numerous pages of product listings to not get crawled. Reduce # of pages in pagination system to improve crawlability & indexation Next/Previous vs. page number list vs. Show All Consider disallowing View All links and forcing spiders through subcat pages (the keyword-rich path). Display as many products per page as possible (max 120) within 150K file size. Fewer products per subcat = fewer pagination pages to crawl at subcat level for max product indexation 1-3 pages pagination = useful for sending different keyword signals? 2009 Stephan M Spencer NetconceptsFaceted Navigation Faceted navigation, a.k.a. guided navigation, provides clickable product inventory breakdowns, by brand, color, price range, etc. By doing so it creates into a huge number of permutations for the spiders to follow. Problem exacerbated with clickable, resortable column headings Nofollow all links leading to low (SEO) value facets, e.g. facets that do price range breakdown, re-sorting and re-pagination Or collapse near-dup facets (canonical tags or revise link URLs) Optimize URLs, title tags, etc. of high-value facets in an automated, scalable fashion (e.g. using GravityStream) 2009 Stephan M Spencer Netconcepts Affiliate URLs Rarely do they help your SEO, because 302 not 301 Run affiliate program in-house; use 301 and/or canonical tags. don't 301 conditionally. Canonical tag isn't necessary if doing 301 Third-party affiliate solutions (like Commission Junction) have a vested interest in not playing ball Canonical tag won't help. PageRank lost at the 302. Examples of affiliate networks that pass the PageRank to the merchant: LinkConnector, DirectTrack 2009 Stephan M Spencer NetconceptsClick-Tracked URLs Heres how to 301 static URLs with a tracking param appended to its canonical equivalent (minus the param) RewriteCond %{QUERY_STRING} ^source=[a-z0-9]*$ RewriteRule ^(.*)$ /$1? [L,R=301] And for dynamic URLs... RewriteCond %{QUERY_STRING} ^(.+)&source=[a-z0-9]+(&?.*)$ RewriteRule ^(.*)$ /$1?%1%2 [L,R=301] 2009 Stephan M Spencer NetconceptsClick-Tracked URLs Need to do some fancy stuff with cookies before 301ing? Invoke a script that cookies the user then 301s them to the canonical URL. RewriteCond %{QUERY_STRING} ^source=([a-z0-9]*)$ RewriteRule ^(.*)$ /cookiefirst.php?source=%1&dest=$1 [L] Note the lack of a R=301 flag above. Thats on purpose. No need to expose this script to the user. Use a rewrite and let the script send the 301 after its done its work. 2009 Stephan M Spencer NetconceptsLegacy URLs Got legacy dynamic URLs youre trying to phase out after switching to static URLs? 301 them... RewriteCond %{QUERY_STRING} id=([0-9]+) RewriteRule ^get_product.php$ /products/%1.html? [L,R=301] Switching to keyword URLs and the script cant do anything with the keywords if passed as params? Use RewriteMap and have a lookup table as a text file. RewriteMap prodmap txt:/home/someusername/prodmap.txt RewriteRule ^/product/([0-9]+)$ ${prodmap:$1} [L,R=301] 2009 Stephan M Spencer NetconceptsLegacy URLs What would the lookup table for the above rule look like? 1001 /products/canon-g10-digital-camera 1002 /products/128-gig-ipod-classic DBM files are supported too. Faster than text file. You could use a script that takes the requested input and delivers back its corresponding output. RewriteMap prodmap prg:/home/someusername/mapscript.pl RewriteRule ^/product/([0-9]+)$ ${prodmap:$1} [L,R=301] 2009 Stephan M Spencer NetconceptsOther Common Issues Non-www and typo domains (The example mentioned earlier...) RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC] RewriteRule ^(.*)$[L,R=301] HTTPS (If you have a separate secure server, you can skip this first line) RewriteCond %{HTTPS} on RewriteRule ^catalog/(.*)[L,R=301] 2009 Stephan M Spencer NetconceptsOther Common Issues If trailing slash is missing, add it RewriteRule ^(.*[^/])$ /$1/ [L,R=301] WordPress handles this by default. Yay WordPress! 2009 Stephan M Spencer NetconceptsConditional 301s? Risky territory! Read Redirects: Good, Bad & ConditionalRedirects: Good, Bad & Conditional To selectively redirect bots that request URLs with session IDs to the URL sans session ID: RewriteCond %{QUERY_STRING} PHPSESSID RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR] RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR] RewriteCond %{HTTP_USER_AGENT} Slurp [OR] RewriteCond %{HTTP_USER_AGENT} Ask\ Jeeves RewriteRule ^(.*)$ /$1 [R=301,L] browscap.ini provides spiders user agents 2009 Stephan M Spencer NetconceptsConditional 301s? Not necessary. Almost always another way (w/o using user agent or IP) In the above example, simply 301 everybody bots and humans alike and stop appending PHPSESSID Seefor more on this.http://yoast.com/phpsessid-url-redirect/ If you have to keep session IDs for functionality reasons, you could use a script to detect for whether the session has expired, and 301 the URL to the canonical equivalent if it has. 2009 Stephan M Spencer NetconceptsNear Duplicates, But Not Quite? What if you can only optimize one version but not all versions? For example... Let's say you have implemented a new URL structure and moved content over to the new URLs. The old URLs still pull up the content too, but the templates are different. The new version has better SEO (title tags are more keyword-rich, there are H1 headings, a couple sentences of intro copy, etc.), but it's the same product information. According to Matt Cutts, using the canonical tag to canonicalize the non- optimized version to the optimized version is high risk. 2009 Stephan M Spencer NetconceptsCountry-specific Versions Country specific versions on the same domain? Create separate "sites" within Google Webmaster Central for each country-specific directory. Then set the Geographical Targeting within each one. Google doesn't view country-specific versions as duplicate content; Google's smarter than that. 2009 Stephan M Spencer NetconceptsManufacturer-Supplied Product Copy Distance yourself from the thin affiliates. Augment with substantial amount of unique, valuable content Customer reviews - trapped/hidden within JavaScript in third- party reviews services like BazaarVoice and PowerReviews Not mashups with Wikipedia, Twitter, & the usual suspects "Uniquify" content. Not sufficient to shuffle the page's content around! Think about overlapping shingles Scaling? Mechanical Turk, yes. Markov chains, no. A nail in the coffin: same titles & meta descriptions 2009 Stephan M Spencer Netconcepts Supplemental Hell? The Supplemental Index still very much exists, and these dups are probably in there. Does Google leave clues about what it considers to be non-canonical / not favored? If only the Supplemental Result label were still supported! *sigh* How about spidering activity? PageRank score? Omitted results? Cached date? Cached link missing? 2009 Stephan M Spencer NetconceptsRelated Resources navigating-mess navigating-mess maximum-seo-impact maximum-seo-impact-12982 2009 Stephan M Spencer NetconceptsThanks! For a free faceted navigation audit, drop me your business card oryour request to To contact me: