Duplicate content presentation March 2012

21
Duplicate Content Filters, Penalties and other Content Minefields 27th March 2012

description

Some valuable insights into why duplicate content on your website is a problem for Google. Work-arounds and suggested solutions are made, but please let us know your thoughts.

Transcript of Duplicate content presentation March 2012

Page 1: Duplicate content presentation   March 2012

Duplicate Content Filters, Penalties and other Content Minefields

27th March 2012

Page 2: Duplicate content presentation   March 2012

Search Quality – the Duplicate Content Headache

Google can’t afford a SERPs of;

1)Search engine optimization Search engine optimization (SEO) is the process of improving the

visibility of a website or a web page in search engines........ 2) Search engine optimization

Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........3) Search engine optimization

Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........4) Search engine optimization

Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........

2

Page 3: Duplicate content presentation   March 2012

Resource – the Duplicate Content Headache

Duplicate content has consequences for SE in;

Wastes Crawler resources - finite number of crawlers

Wastes Bandwidth – how often can you crawl 1 trillion documents and keep your index fresh?

Increases Query CPU time – how do you search 1 trillion documents as quickly as possible?

3

Page 4: Duplicate content presentation   March 2012

Document importance – Duplicate Content Headache

Duplicate content can be a signal of an important document;

• Song lyrics

• Scholarly texts and historical documents, eg the Bible (1,000 pages)

• The Linux manual (2,000 pages)

• Breaking News – Associated Press, Reuters

etc.

4

Page 5: Duplicate content presentation   March 2012

Types of Duplicate Content

Duplicate content comes in many forms

Intentional vs non intentional

On-site vs off-site

5

Page 6: Duplicate content presentation   March 2012

On-Site Duplicate Content (Impacts Quality Score)Intentional• Printer friendly pages•Different font sizes•PDF documents•Archive (non graphics versions)•Shopping filters (sort by and pagination)•RSS feeds

Non-intentional• Affiliate URLs - www.example.com/?btag=123• Adwords Campaigns - www.example.com/?utc=google•Search results•www vs non www URLs•https vs http•Stubs/plugins

6

Page 7: Duplicate content presentation   March 2012

On-Site Duplicate Content (Impacts Quality Score)10’000s of stub pages worst case scenario example;

7

This was 2 weeks after Andy had removed the duplicate links from the search pages on our advice eg;http://www.motors.co.uk/Ford-Escort-0-9999999---2http://www.motors.co.uk/Ford-Escort-0-9999999--U-2-http://www.motors.co.uk/Ford-Escort-0-9999999---2%20-

Page 8: Duplicate content presentation   March 2012

Off-Site Duplicate Content (Filters and Penalties)Intentional vs non-intentional somewhat grey

Domain branding eg .com, .co.za(Mobile website)Content syndicationContent theftStaging websites a common problem!!

Quality signals are often used to filter off-site Duplicates!!!

8

Page 9: Duplicate content presentation   March 2012

How Does Google Filter Off-site Duplicate Content

Authors feel they have a right to rank for their own content – Google’s Loyalty is to its users!!!

Google doesn’t necessarily reward a source or original but assesses;

• Relevance (eg is an article in context)• Domain authority & links (eg Google Knol, Facebook)• Fresh content boost

• Site quality signals (eg internal duplicate content!!!)

9

Page 10: Duplicate content presentation   March 2012

Examples of Off-site Duplicate Content and QualityClient with .com.au and a .com with https duplicates

Casino Client with a lot of stub pages (pre Panda)

Casino site – severe health issues;

10

Page 11: Duplicate content presentation   March 2012

How to Diagnose (on-site) Duplicate Content

Link building will exacerbate duplicate content indexing

Keep an eye on indexed pages (weekly) and look for spikes in Google Indexing, (Yahoo and Bing)

Look for site:example.com duplicates

Use Xenu link checker

Heed any Webmaster Tools warnings

Check your crawling and cache dates Frequent update but stale cache dates = dupe content issues

11

Page 12: Duplicate content presentation   March 2012

How to address on-site and off-site duplicate content

You have a whole armoury of potential tools including;

Robots.txt exclusionRobots meta tagCanonical tagWebmaster URL exclusionPassword protection(301 redirects)

(File a DMCA against serial content thieves?)

Lot of well-meaning people give bad advice though

12

Page 13: Duplicate content presentation   March 2012

Google Engineers Can’t Agree

Page 14: Duplicate content presentation   March 2012

Adam Lasnik – “Deftly Dealing with Duplicate Content” 2006

Probably the authoritative guide to duplicate content;

• What is duplicate content?

• What isn't duplicate content?

• Why does Google care about duplicate content?

• What does Google do about it?

• How can Webmasters proactively address duplicate content issues?

`

Page 15: Duplicate content presentation   March 2012

Deftly Dealing with... - Our advice/experience

Robots.txt

Routinely ignored by Google, probably because of malware

User-agent: *

Allow: /the-good-stuff/Disallow: /the-malware/

Robots.txt is ignored unless combined with emergency Webmaster Tools URL removal (3 months)

15

Page 16: Duplicate content presentation   March 2012

Our advice/experience

Canonical tag

Works great for cross-domain duplicate content

Largely ineffective for pagination eg shopping sites

Totally ineffective unless canonical URLs are VERY similar if not identical

16

Page 17: Duplicate content presentation   March 2012

Our advice/experience

Robots Meta Tag

Noindex,Follow - 100% obeyed by Google and passes Page Rank too

Very effective for pagination eg shopping sites

Works well for tracking links too (www.example.com/?affid=123456)

Doesn’t work when used with blocking robots.txt

17

Page 18: Duplicate content presentation   March 2012

Our advice/experience

Password Protect/htaccess 403 Forbidden

Works great for staging sites

Stubs - Problem in that it generates Webmaster Tools errors

Our feeling best to avoid on your main domain

18

Page 19: Duplicate content presentation   March 2012

Extreme Techniques to Avoid Dupe ContentMake all your backend .exe with htaccess

Page 20: Duplicate content presentation   March 2012

Summary

Duplicate content is a minefield!

Filters usually apply, penalties are very rare

You have the answer in your own hands

Stay on top of your site’s health – especially internal duplicate content

Page 21: Duplicate content presentation   March 2012

Thank you for your attention!

Thanks to:Anton GroeneveldtCarla dos Santos