Www.seoresearchlabs.com keyword research – corporate training – private coaching Argh! We’ve...

16
www.seoresearchlabs.co m keyword research – corporate training – private coaching Argh! We’ve Been Duped! Dan Thies, SEO Research Labs

Transcript of Www.seoresearchlabs.com keyword research – corporate training – private coaching Argh! We’ve...

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Argh! We’ve Been Duped!

Dan Thies, SEO Research Labs

www.seoresearchlabs.comkeyword research – corporate training – private coaching

A (little) about me...

• 10 years of SEO… • Once held the #1 ranking on Infoseek for “sex” –

for 18 minutes• Make up your own joke• Published “SEO Fast Start” in 2001• Started SEO Research Labs in Jan. 2003• Author, SitePoint Search Engine Marketing Kit

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Topics For Today

• Getting Duped vs. Duping Yourself• Impacts on Traffic• Reverse Cloaking & Spider Validation• Changing & Rotating Content• DMCA & Dupes• Challenges to search engines

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Defining The Problem• Duplicate Content

– The same content, presented on more than one URL

– Most web sites do this to an extent• http://www.example.com vs. http://example.com• www.example.com/ vs. www.example.com/index.html

• Near-Duplicate– “Nearly the same…”

– Search engines look for uniqueness

• Filtered from index vs. filtered from SERPs

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Getting Duped vs. Duping Yourself

• Duping Yourself – See Other Sessions– Duplicate URLs

– Shopping sites w/ duplicate product descriptions

– Near-empty pages

• Getting Duped – You Are Here– Screen scrapers & “borrowing”

– RSS Feeds (or did you do it to yourself?)

– Proxy URLs

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Impacts on Traffic• Specific site: (omitted…)• Duped: 10-15% of traffic is organic search• De-Duped: 20-25% from organic search• Revenue drop… “feelable.”

• This client is very good at PPC and other marketing, many sites would suffer far worse from a 50% drop in SEO referrals

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Reverse Cloaking vs. Scrapers

• Simple user agent detection - If the user-agent is NOT a major SE spider, insert:

<meta name=“robots” content=“noindex”>

– Screen scrapers that steal an entire page’s HTML get a page that will not be indexed.

– Easily thwarted by someone who cares to, but reduces duplication by scraping substantially

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Links By Proxy – An Old TrickFun With Spam:

Hack someone else’s site to create a link or redirect to one of your sites – either create a page or craft a URL using XSS attack… then link to it using a proxy URL. Woo-hoo!

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Public Proxies

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Proxy URLs As Duplicates• Thousands of public anonymous proxy servers• Every URL on the web can be duplicated by them• Proxy-based duplicates, when linked to, can affect

duplicate content filtering– Search Engine Spiders access proxy URLs too!

• Public proxies pass along the user-agent– IE version of site vs. Mozilla vs. Opera etc.– Googlebot, MSNBot, Slurp, Ask…

• But proxies use their own IP address– Check logs – do any “Googlebot” IPs resolve to proxies (e.g.

webwarper.net)?

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Spider Validation vs. Proxies• When you get a request from a “search engine spider”

user agent, check the requesting IP:– If the IP address is “owned” by the search engine, deliver the

page– If the IP address is not owned by the search engine, deliver a

different page, empty page, or 403 Forbidden– NSLookup is less reliable than checking ARIN’s WHOIS

database– Store lists of good vs. bad IPs, to speed processing

• Yes, it’s really the SE’s bot, but coming to a proxy URL– So, you MUST block the request to avoid duplication

– Warning: Danger – Danger – Danger! Use With Caution!

www.seoresearchlabs.comkeyword research – corporate training – private coaching

But What If They Get Through?• Changing & Rotating Content

– Testimonials– News & Headlines– Brute Force

• The most important page on your site is probably the home page, yet it’s probably the least often changed.

• How much is unique? How often to change?• If the page changes every 24 hours, a proxy can only

duplicate you for 24 hours + indexing lead time• Our client is changing one paragraph of copy every 4

hours – 42 variations per week.

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Monitoring Dupes• Set up monitoring for a “signature SERP”

– Text that is unique to your page or pages– Home page duplication is the #1 issue– Use a second signature for internal pages

• Google Alerts– www.google.com/alerts

• Roll your own with the Google API– www.google.com/apis– or www.googlealert.com

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Killing Dupes w/ DMCA

• DMCA, Digital Millenium Copyright Act• I am NOT an attorney, lawyer, barrister, solicitor,

etc. and this is NOT legal advice• Ian McAnerin’s templates:

– http://www.mcanerin.com/EN/articles/copyright-03.asp

– Or Google McAnerin DMCA

• To Hosting Provider (ISP) to remove sites/pages• To search engines to remove from index

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Challenging The Search Engines• Duplication by proxy, by theft, etc. is a major issue for

webmasters – a drain on resources, and a pain in the…

• Like search engine spam, much of it is paid for by search engines through contextual ad networks & PPC

• Identify the originals – is the page in DMOZ? Is it in the Y! Directory? It just might be the original!

• How many DMCA notices can a search engine afford to process?

• Why are any URLs from known proxies still indexed after all these years?

www.seoresearchlabs.comkeyword research – corporate training – private coaching

Contact InformationDan Thies, [email protected]

Free Training Videos:www.seoresearchlabs.com/keywordvideo

www.seoresearchlabs.com/linkvideo

Free Tools:www.seoresearchlabs.com/tools.php